Global ETD Search

1	Pulsar Search Using Supervised Machine Learning Ford, John M. 01 January 2017 (has links) Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy. Ensemble Classifiers Machine Learning Pulsar Search Computer Sciences
2	Insertion adaptative en stéganographie : application aux images numériques dans le domaine spatial / Adaptive Steganography : application to digital images in spatial domain Kouider, Sarra 17 December 2013 (has links) La stéganographie est l'art de la communication secrète. L'objectif est de dissimuler un message secret dans un médium anodin de sorte qu'il soit indétectable. De nos jours, avec la généralisation d'Internet et l'apparition des supports numériques (fichiers audio, vidéos ou images), plusieurs philosophies de conception de schéma stéganographique ont été proposées. Parmi les méthodes actuelles appliquées aux images numériques naturelles, nous trouvons les méthodes d'insertion adaptative, dont le principe repose sur la modification du médium de couverture avec une garantie d'avoir un certain niveau de sécurité. Ces méthodes représentent une véritable avancée en stéganographie.Dans ce manuscrit, après avoir rappelé les concepts récents de stéganographie adaptative, nous présentons une procédure automatique et complète pour l'insertion adaptative de données secrètes dans des images numériques naturelles. L'approche proposée est une « méta-méthode » basée « oracle », appelée ASO (Adaptive Steganography by Oracle), qui permet de préserver à la fois la distribution de l'image de couverture et la distribution de la base d'images utilisée par l'émetteur. Notre approche permet d'obtenir des résultats nettement supérieurs aux méthodes actuelles de l'état de l'art, et est donc l'une, si ce n'est la meilleure approche du moment. Par ailleurs, nous définissons également un nouveau paradigme en stéganographie qui est la stéganographie par base, ainsi qu'une nouvelle mesure de sélection pour les images stéganographiées, permettant d'améliorer encore plus les performances de notre schéma d'insertion. Les différentes expérimentations, que nous avons effectuées sur des images réelles, ont confirmé la pertinence de cette nouvelle approche. / Steganography is the art of secret communication. The goal is to hide a secret message in an unsuspicious object in such a way that no one can detect it. Nowadays, with the Internet spread and the emergence of digital supports (audio files, videos, or images), several philosophies of designing steganographic methods were proposed. One of the most usual embedding methods used with real digital images is the adaptive embedding algorithms, which is based on the modification of the cover image with a guarantee of a certain security level. These methods represent an important progress in steganography.In this Ph.D. Thesis, we present a fully automated procedure for the adaptive embedding of secret data in digital images. For this, after recalling the recent concepts of adaptive steganography, we first introduce a clear formalism to define a new "meta-method" steganographic approach based on "oracle", whichwe called ASO (Adaptive Steganography by Oracle). Then, we define a new steganographic paradigm called "the steganography by database paradigm", and propose a new selection criterion to further enhance the security of the transmission phase of ASO. Experimental results show that our embedding approach ASO provides the highest level of steganographic security. It is then currently the best or one of the best approaches of the state of the art. Stéganographie Stéganalyse Ensemble de classifieurs Oracle Carte de détectabilité Sécurité Steganography Steganalysis Ensemble Classifiers Oracle Detectability map Security
3	Machine learning for automatic classification of remotely sensed data Milne, Linda, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) As more and more remotely sensed data becomes available it is becoming increasingly harder to analyse it with the more traditional labour intensive, manual methods. The commonly used techniques, that involve expert evaluation, are widely acknowledged as providing inconsistent results, at best. We need more general techniques that can adapt to a given situation and that incorporate the strengths of the traditional methods, human operators and new technologies. The difficulty in interpreting remotely sensed data is that often only a small amount of data is available for classification. It can be noisy, incomplete or contain irrelevant information. Given that the training data may be limited we demonstrate a variety of techniques for highlighting information in the available data and how to select the most relevant information for a given classification task. We show that more consistent results between the training data and an entire image can be obtained, and how misclassification errors can be reduced. Specifically, a new technique for attribute selection in neural networks is demonstrated. Machine learning techniques, in particular, provide us with a means of automating classification using training data from a variety of data sources, including remotely sensed data and expert knowledge. A classification framework is presented in this thesis that can be used with any classifier and any available data. While this was developed in the context of vegetation mapping from remotely sensed data using machine learning classifiers, it is a general technique that can be applied to any domain. The emphasis of the applicability for this framework being domains that have inadequate training data available. contribution analysis ensemble classifiers multi-strategy classification attribute selection feature selection
4	Machine learning for automatic classification of remotely sensed data Milne, Linda, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) As more and more remotely sensed data becomes available it is becoming increasingly harder to analyse it with the more traditional labour intensive, manual methods. The commonly used techniques, that involve expert evaluation, are widely acknowledged as providing inconsistent results, at best. We need more general techniques that can adapt to a given situation and that incorporate the strengths of the traditional methods, human operators and new technologies. The difficulty in interpreting remotely sensed data is that often only a small amount of data is available for classification. It can be noisy, incomplete or contain irrelevant information. Given that the training data may be limited we demonstrate a variety of techniques for highlighting information in the available data and how to select the most relevant information for a given classification task. We show that more consistent results between the training data and an entire image can be obtained, and how misclassification errors can be reduced. Specifically, a new technique for attribute selection in neural networks is demonstrated. Machine learning techniques, in particular, provide us with a means of automating classification using training data from a variety of data sources, including remotely sensed data and expert knowledge. A classification framework is presented in this thesis that can be used with any classifier and any available data. While this was developed in the context of vegetation mapping from remotely sensed data using machine learning classifiers, it is a general technique that can be applied to any domain. The emphasis of the applicability for this framework being domains that have inadequate training data available. contribution analysis ensemble classifiers multi-strategy classification attribute selection feature selection
5	Реконфигурабилне архитектуре за хардверску акцелерацију предиктивних модела машинског учења / Rekonfigurabilne arhitekture za hardversku akceleraciju prediktivnih modela mašinskog učenja / Reconfigurable Architectures for Hardware Acceleration of Machine Learning Classifiers Vranjković Vuk 02 July 2015 (has links) <p>У овој дисертацији представљене су универзалне реконфигурабилне<br />архитектуре грубог степена гранулације за хардверску имплементацију<br />DT (decision trees), ANN (artificial neural networks) и SVM (support vector<br />machines) предиктивних модела као и хомогених и хетерогених<br />ансамбала. Коришћењем ових архитектура реализоване су две врсте<br />DT модела, две врсте ANN модела, две врсте SVM модела и седам<br />врста ансамбала на FPGA (field programmable gate arrays) чипу.<br />Експерименти, засновани на скуповима из стандардне UCI базе скупова<br />за машинско учење, показују да FPGA имплементација омогућава<br />значајно убрзање (од 1 до 6 редова величине) просечног времена<br />потребног за предикцију, у поређењу са софтверским решењима.</p> / <p>U ovoj disertaciji predstavljene su univerzalne rekonfigurabilne<br />arhitekture grubog stepena granulacije za hardversku implementaciju<br />DT (decision trees), ANN (artificial neural networks) i SVM (support vector<br />machines) prediktivnih modela kao i homogenih i heterogenih<br />ansambala. Korišćenjem ovih arhitektura realizovane su dve vrste<br />DT modela, dve vrste ANN modela, dve vrste SVM modela i sedam<br />vrsta ansambala na FPGA (field programmable gate arrays) čipu.<br />Eksperimenti, zasnovani na skupovima iz standardne UCI baze skupova<br />za mašinsko učenje, pokazuju da FPGA implementacija omogućava<br />značajno ubrzanje (od 1 do 6 redova veličine) prosečnog vremena<br />potrebnog za predikciju, u poređenju sa softverskim rešenjima.</p> / <p>This thesis proposes universal coarse-grained reconfigurable computing<br />architectures for hardware implementation of decision trees (DTs), artificial<br />neural networks (ANNs), support vector machines (SVMs), and<br />homogeneous and heterogeneous ensemble classifiers (HHESs). Using<br />these universal architectures, two versions of DTs, two versions of SVMs,<br />two versions of ANNs, and seven versions of HHESs machine learning<br />classifiers, have been implemented in field programmable gate arrays<br />(FPGA). Experimental results, based on datasets of standard UCI machine<br />learning repository database, show that FPGA implementation provides<br />significant improvement (1–6 orders of magnitude) in the average instance<br />classification time, in comparison with software implementations.</p>
6	Hardware Acceleration of Nonincremental Algorithms for the Induction of Decision Trees and Decision Tree Ensembles / Хардверска акцелерација неинкременталних алгоритама за формирање стабала одлуке и њихових ансамбала / Hardverska akceleracija neinkrementalnih algoritama za formiranje stabala odluke i njihovih ansambala Vukobratović Bogdan 22 February 2017 (has links) <p>The thesis proposes novel full decision tree and decision tree ensemble<br />induction algorithms EFTI and EEFTI, and various possibilities for their<br />implementations are explored. The experiments show that the proposed EFTI<br />algorithm is able to infer much smaller DTs on average, without the<br />significant loss in accuracy, when compared to the top-down incremental DT<br />inducers. On the other hand, when compared to other full tree induction<br />algorithms, it was able to produce more accurate DTs, with similar sizes, in<br />shorter times. Also, the hardware architectures for acceleration of these<br />algorithms (EFTIP and EEFTIP) are proposed and it is shown in experiments<br />that they can offer substantial speedups.</p> / <p>У овоj дисертациjи, представљени су нови алгоритми EFTI и EEFTI за<br />формирање стабала одлуке и њихових ансамбала неинкременталном<br />методом, као и разне могућности за њихову имплементациjу.<br />Експерименти показуjу да jе предложени EFTI алгоритам у могућности<br />да произведе драстично мања стабла без губитка тачности у односу на<br />постојеће top-down инкременталне алгоритме, а стабла знатно веће<br />тачности у односу на постојеће неинкременталне алгоритме. Такође су<br />предложене хардверске архитектуре за акцелерацију ових алгоритама<br />(EFTIP и EEFTIP) и показано је да је уз помоћ ових архитектура могуће<br />остварити знатна убрзања.</p> / <p>U ovoj disertaciji, predstavljeni su novi algoritmi EFTI i EEFTI za<br />formiranje stabala odluke i njihovih ansambala neinkrementalnom<br />metodom, kao i razne mogućnosti za njihovu implementaciju.<br />Eksperimenti pokazuju da je predloženi EFTI algoritam u mogućnosti<br />da proizvede drastično manja stabla bez gubitka tačnosti u odnosu na<br />postojeće top-down inkrementalne algoritme, a stabla znatno veće<br />tačnosti u odnosu na postojeće neinkrementalne algoritme. Takođe su<br />predložene hardverske arhitekture za akceleraciju ovih algoritama<br />(EFTIP i EEFTIP) i pokazano je da je uz pomoć ovih arhitektura moguće<br />ostvariti znatna ubrzanja.</p>
7	Dataset selection for aggregate model implementation in predictive data mining Lutu, P.E.N. (Patricia Elizabeth Nalwoga) 15 November 2010 (has links) Data mining has become a commonly used method for the analysis of organisational data, for purposes of summarizing data in useful ways and identifying non-trivial patterns and relationships in the data. Given the large volumes of data that are collected by business, government, non-government and scientific research organizations, a major challenge for data mining researchers and practitioners is how to select relevant data for analysis in sufficient quantities, in order to meet the objectives of a data mining task. This thesis addresses the problem of dataset selection for predictive data mining. Dataset selection was studied in the context of aggregate modeling for classification. The central argument of this thesis is that, for predictive data mining, it is possible to systematically select many dataset samples and employ different approaches (different from current practice) to feature selection, training dataset selection, and model construction. When a large amount of information in a large dataset is utilised in the modeling process, the resulting models will have a high level of predictive performance and should be more reliable. Aggregate classification models, also known as ensemble classifiers, have been shown to provide a high level of predictive accuracy on small datasets. Such models are known to achieve a reduction in the bias and variance components of the prediction error of a model. The research for this thesis was aimed at the design of aggregate models and the selection of training datasets from large amounts of available data. The objectives for the model design and dataset selection were to reduce the bias and variance components of the prediction error for the aggregate models. Design science research was adopted as the paradigm for the research. Large datasets obtained from the UCI KDD Archive were used in the experiments. Two classification algorithms: See5 for classification tree modeling and K-Nearest Neighbour, were used in the experiments. The two methods of aggregate modeling that were studied are One-Vs-All (OVA) and positive-Vs-negative (pVn) modeling. While OVA is an existing method that has been used for small datasets, pVn is a new method of aggregate modeling, proposed in this thesis. Methods for feature selection from large datasets, and methods for training dataset selection from large datasets, for OVA and pVn aggregate modeling, were studied. The experiments of feature selection revealed that the use of many samples, robust measures of correlation, and validation procedures result in the reliable selection of relevant features for classification. A new algorithm for feature subset search, based on the decision rule-based approach to heuristic search, was designed and the performance of this algorithm was compared to two existing algorithms for feature subset search. The experimental results revealed that the new algorithm makes better decisions for feature subset search. The information provided by a confusion matrix was used as a basis for the design of OVA and pVn base models which aren combined into one aggregate model. A new construct called a confusion graph was used in conjunction with new algorithms for the design of pVn base models. A new algorithm for combining base model predictions and resolving conflicting predictions was designed and implemented. Experiments to study the performance of the OVA and pVn aggregate models revealed the aggregate models provide a high level of predictive accuracy compared to single models. Finally, theoretical models to depict the relationships between the factors that influence feature selection and training dataset selection for aggregate models are proposed, based on the experimental results. / Thesis (PhD)--University of Pretoria, 2010. / Computer Science / unrestricted Dataset partitioning Data mining Bias reduction Predictive modeling Classification Model aggregation Ensemble classifiers Ova classification Pvn classification Dataset selection Featureselection Variable selection Large datasets Variance reduction Dataset sampling UCTD

1

Page generated in 0.3993 seconds