271 |
Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery DiseaseDuan, Haoyang 15 May 2014 (has links)
From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on Single-Nucleotide Polymorphisms (SNPs) from the Ontario Heart Genomics Study (OHGS). First, the thesis explains the k-Nearest Neighbour (k-NN) and Random Forest learning algorithms, and includes a complete proof that k-NN is universally consistent in finite dimensional normed vector spaces. Second, the thesis introduces two dimensionality reduction techniques: Random Projections and a new method termed Mass Transportation Distance (MTD) Feature Selection. Then, this thesis compares the performance of Random Projections with k-NN against MTD Feature Selection and Random Forest for predicting artery disease. Results demonstrate that MTD Feature Selection with Random Forest is superior to Random Projections and k-NN. Random Forest is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.
|
272 |
Improving the quality of software design through pattern ontologyBoyer, Marc Guy 31 August 2011 (has links)
Software engineers use design patterns to refactor software models for quality. This displaces domain patterns and makes software hard to maintain. Detecting design patterns directly in requirements can circumvent this problem. To facilitate the analogical transfer of patterns from problem domain to solution model however we must describe patterns in ontological rather than in technical terms. In a first study novice designers used both pattern cases and a pattern ontology to detect design ideas and patterns in requirements. Errors in detection accuracy led to the revision of the pattern ontology and a second study into its pattern-discriminating power. Study results demonstrate that pattern ontology is superior to pattern cases in assisting novice software engineers in identifying patterns in the problem domain.
|
273 |
Σύγχρονες τεχνικές στις διεπαφές ανθρώπινου εγκεφάλου - υπολογιστήΤσιλιγκιρίδης, Βασίλειος 16 June 2011 (has links)
Τα συστήματα διεπαφών ανθρώπινου εγκεφάλου-υπολογιστή (BCIs: Brain-Computer Interfaces) απαιτούν την πραγματικού χρόνου, αποτελεσματική επεξεργασία των μετρήσεων των ηλεκτροεγκεφαλογραφικών (ΗΕΓ) σημάτων του χρήστη τους, προκειμένου να μεταφράσουν τις νοητικές διεργασίες/προθέσεις του σε σήματα ελέγχου εξωτερικών διατάξεων ή συστημάτων. Στο πλαίσιο της εργασίας αυτής μελετήθηκε το θεωρητικό υπόβαθρο του προβλήματος και αναλύθηκαν συνοπτικά οι κυριότερες τεχνικές που χρησιμοποιούνται σήμερα. Επιπρόσθετα, παρουσιάστηκε μία μέθοδος ταξινόμησης των νοητικών προθέσεων της αριστερής και δεξιάς κίνησης των χεριών ενός χρήστη η οποία εφαρμόστηκε σε πραγματικά ιατρικά δεδομένα. Η εξαγωγή των χαρακτηριστικών που διαφοροποιούνται μεταξύ των δύο καταστάσεων βασίστηκε σε πληροφορίες του πεδίου χρόνου-συχνότητας, οι οποίες αντλούνται με το φιλτράρισμα των ακατέργαστων ΗΕΓ δεδομένων και με τη βοήθεια των αιτιατών κυματιδίων Morlet, ενώ για την επακόλουθη ταξινόμηση των χαρακτηριστικών αναπτύχθηκαν και συγκρίθηκαν δύο αξιόπιστες μέθοδοι. Η πρώτη αφορά στη δημιουργία πιθανοθεωρητικών προτύπων κανονικής κατανομής για κάθε κατηγορία πρόθεσης κίνησης, με την τελική απόφαση ταξινόμησης να λαμβάνεται με εφαρμογή του απλού ταξινομητή του Bayes, ενώ η δεύτερη δημιουργεί ένα πρότυπο ταξινόμησης με βάση το θεωρητικό πλαίσιο των Μηχανών Διανυσμάτων Υποστήριξης (SVM). Στόχος του προβλήματος της δυαδικής ταξινόμησης είναι να αποφασίζεται σε ποια από τις δύο κατηγορίες ανήκει μία δεδομένη νοητική πρόθεση όσο το δυνατόν ταχύτερα και αξιόπιστα, έτσι ώστε ο σχεδιαζόμενος αλγόριθμος να εξυπηρετήσει ένα πλαίσιο ανατροφοδότησης της τελικής απόφασης στο χρήστη σε συνθήκες πραγματικού χρόνου. / Brain-Computer Interfaces (BCIs) demand the efficient processing of EEG data in order to translate one's thought or wish into a control signal that can be applied as input to external devices. Here we present a method to classify left from right hand movements, by extracting features from the data with Morlet wavelets and classifying with two different models, SVMs and Naive Bayes Classifier.
|
274 |
Detecção de fraude em hidrômetros utilizando técnicas de reconhecimento de padrões / Fraud detection in water meters using pattern recognitionDetroz, Juliana Patrícia 26 February 2016 (has links)
Made available in DSpace on 2016-12-12T20:22:54Z (GMT). No. of bitstreams: 1
Juliana P Detroz.pdf: 11151863 bytes, checksum: f8e2db7d1e13c674adf28e9484a35d9d (MD5)
Previous issue date: 2016-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / With the emerging hydric crisis, water shortage has been a great global concern. Water supply companies have been increasingly looking for solutions to reduce water wastage and many efforts have been made aiming to promote a better management of this resource. Fraud detection is one of these actions, as the irregular violations are usually held precariously,
thus, causing leaks. Hidden and apparent leakage is a major cause of the high water loss rates. In this context, the use of technology in order to automate the identification of potential frauds can be an important support tool to avoid water waste. In this sense, this research aims to apply pattern recognition techniques in the implementation of an automated detection of suspected irregularities cases in water meters, through image analysis. We considered as a potential fraud when there is evidences of violations and seals absences. The proposed computer vision system is composed by three steps: the detection of the water meter location, obtained by OPF classifier and HOG descriptor, detecting the seals through morphological image processing and segmentation methods; and the classification of frauds, in which the condition of the water meter seals is assessed. We validated the proposed framework using a dataset containing images of water meter inspections. The water meter detection solution (HOG+OPF) achieved an average accuracy of 89.03%, showing superior results than SVM (linear and RBF). A comparative analysis of 12 feature descriptors (color and texture) was performed on the classification of the seals condition step. The results of these methods were evaluated individually and also combined, reaching average accuracy up to 81.29%. We concluded that the use of a computer vision system is a promising strategy and has potential to benefit and support the analysis of fraud detection. / Em tempos de racionamento dos recursos hídricos, o desperdício de água tem sido um tema de relevância mundial. Os vazamentos ocultos e aparentes são uma das principais causas dos elevados índices de perdas de água tratada. Esforços são despendidos pelas companhias de saneamento a fim de reduzir as perdas, sendo o combate às fraudes uma destas ações. Neste contexto, o uso da tecnologia para automatizar a identificação de fraude mostra-se uma importante ferramenta de apoio no combate ao desperdício. Esta pesquisa tem como objetivo aplicar técnicas de reconhecimento de padrões na detecção automatizada de casos suspeitos de irregularidades em hidrômetros. No escopo deste trabalho foram consideradas suspeitas de fraude as violações e ausências de lacres. A abordagem proposta visa, através de um sistema de visão computacional, auxiliar no combate a fraudes em hidrômetros e, consequentemente, evitar o desperdício de água associado a estas. Para isto, a execução do sistema proposto é dividida em três etapas: detecção do hidrômetro, fazendo uso do classificador OPF e descritor HOG; a detecção da área estimada dos lacres, obtida pela aplicação de métodos de processamento morfológico e segmentação; e a classificação das fraudes a partir da condição dos lacres do hidrômetro. A validação foi executada utilizando-se um conjunto de imagens de fiscalizações. Na primeira etapa, a solução utilizando o classificador OPF alcançou taxa de acerto média de 89, 03%, sendo superior a resultados dos métodos SVM linear e RBF. Para a classificação da condição dos lacres, realizou-se uma análise comparativa de 12 descritores de imagem, de cor e textura, sendo avaliados os resultados individuais e combinados, atingindo taxas de acerto média de até 81, 29%. Com isto, pode-se concluir que o uso de um sistema especialista de visão computacional para o problema de detecção de fraudes é uma estratégia promissora e com potencial para beneficiar a análise e o suporte à tomada de decisões.
|
275 |
Aplica??o de ontologias para m?todos de negocia??o de um sistema multiagente para o reconhecimento de padr?esBezerra, Val?ria Maria Siqueira 14 July 2006 (has links)
Made available in DSpace on 2014-12-17T15:47:48Z (GMT). No. of bitstreams: 1
ValeriaMSB.pdf: 564848 bytes, checksum: fbed1b62b5d33ac05db3c528f1bdcf62 (MD5)
Previous issue date: 2006-07-14 / The use of intelligent agents in multi-classifier systems appeared in order to making the centralized decision process of a multi-classifier system into a distributed, flexible and
incremental one. Based on this, the NeurAge (Neural Agents) system (Abreu et al 2004) was proposed. This system has a superior performance to some combination-centered
methods (Abreu, Canuto, and Santana 2005). The negotiation is important to the multiagent system performance, but most of negotiations are defined informaly. A way to formalize the negotiation process is using an ontology. In the context of classification tasks, the ontology provides an approach to formalize the concepts and rules that manage the relations between these concepts. This work aims at using ontologies to make a formal description of the negotiation methods of a multi-agent system for classification tasks, more specifically the NeurAge system. Through ontologies, we intend to make the NeurAge system more formal and open, allowing that new agents can be part of such system during the negotiation.
In this sense, the NeurAge System will be studied on the basis of its functioning and reaching, mainly, the negotiation methods used by the same ones. After that, some
negotiation ontologies found in literature will be studied, and then those that were chosen for this work will be adapted to the negotiation methods used in the NeurAge. / A utiliza??o de agentes inteligentes em sistemas multi-classificadores surgiu devido ? necessidade de tornar o processo de tomada de decis?o de tais sistemas distribu?do,
aut?nomo e flex?vel. Baseado nisso, foi proposto o sistema NeurAge (Neural Agents) (Abreu et al 2004). Este sistema possui um desempenho superior a v?rios m?todos de
combina??o centralizados (Abreu, Canuto, and Santana 2005). A negocia??o ? importante para o desempenho de um sistema multiagente, por?m a maioria das negocia??es s?o definidas de maneira informal. Um modo de formalizar as negocia??es ? atrav?s do uso de ontologias. Dentro do contexto de classifica??o de padr?es, o uso de ontologias fornece uma abordagem para formalizar os conceitos e regras que governam as rela??es entre esses conceitos. O objetivo deste trabalho ? utilizar ontologias para formalizar os m?todos de negocia??o de um sistema multiagente para reconhecimento de padr?es, mais especificamente o
sistema NeurAge. Atrav?s de ontologias, pretende-se deixar o sistema NeurAge mais formal e aberto, permitindo que novos agentes possam fazer parte de tal sistema durante o processo de negocia??o. Para a realiza??o deste objetivo, o Sistema NeurAge ser? estudado com base em seu
funcionamento e focalizando, principalmente, os m?todos de negocia??o utilizados pelo mesmo. Na seq??ncia, algumas ontologias para negocia??o encontradas na literatura
ser?o estudadas, e ent?o aquelas que foram escolhidas para este trabalho ser?o adaptadas aos m?todos de negocia??o utilizados no NeurAge.
|
276 |
Dilema da diversidade-acur?cia: um estudo emp?rico no contexto de multiclassificadoresOliveira, Diogo Fagundes de 01 September 2008 (has links)
Made available in DSpace on 2014-12-17T15:47:49Z (GMT). No. of bitstreams: 1
DiogoFO.pdf: 866073 bytes, checksum: bf59c2597aef9b7382b7e14bd4914265 (MD5)
Previous issue date: 2008-09-01 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / Multi-classifier systems, also known as ensembles, have been widely used to solve several problems, because they, often, present better performance than the individual classifiers that form these systems. But, in order to do so, it s necessary that the base classifiers to be as accurate as diverse among themselves this is also known as diversity/accuracy dilemma. Given its importance, some works have investigate the ensembles behavior in
context of this dilemma. However, the majority of them address homogenous ensemble, i.e., ensembles composed only of the same type of classifiers. Thus, motivated by this limitation, this thesis, using genetic algorithms, performs a detailed study on the dilemma diversity/accuracy for heterogeneous ensembles / Sistemas Multiclassificadores, tamb?m conhecidos como comit?s de classificadores, t?m sido amplamente utilizados para resolver os mais variados problemas, pois em geral t?m
melhores desempenhos que os classificadores base que formam esses sistemas. Para que isso ocorra, por?m, ? necess?rio que os classificadores base sejam t?o acurados quanto diversos entre si isso ? conhecido como dilema da diversidade-acur?cia. Dado a sua import?ncia, alguns trabalhos sobre o estudo do omportamento dos comit?s no contexto desse dilema foram propostos. Entretanto, a maioria dos trabalhos estudou tal problema para comit?s homog?neos, ou seja, comit?s formados apenas por classificadores do mesmo tipo. Sendo assim, motivado por esta limita??o, esta disserta??o, usando algoritmos gen?ticos, efetua um estudo mais detalhado sobre o dilema da diversidade-acur?cia em comit?s heterog?neos
|
277 |
The Detection of Reliability Prediction Cues in Manufacturing Data from Statistically Controlled ProcessesJanuary 2011 (has links)
abstract: Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's inherent quality. However, at times, there may be cues in the upstream test data that, if detected, could serve to predict the likelihood of downstream failure or performance degradation induced by product use or environmental stresses. This study explores the use of downstream factory test data or product field reliability data to infer data mining or pattern recognition criteria onto manufacturing process or upstream test data by means of support vector machines (SVM) in order to provide reliability prediction models. In concert with a risk/benefit analysis, these models can be utilized to drive improvement of the product or, at least, via screening to improve the reliability of the product delivered to the customer. Such models can be used to aid in reliability risk assessment based on detectable correlations between the product test performance and the sources of supply, test stands, or other factors related to product manufacture. As an enhancement to the usefulness of the SVM or hyperplane classifier within this context, L-moments and the Western Electric Company (WECO) Rules are used to augment or replace the native process or test data used as inputs to the classifier. As part of this research, a generalizable binary classification methodology was developed that can be used to design and implement predictors of end-item field failure or downstream product performance based on upstream test data that may be composed of single-parameter, time-series, or multivariate real-valued data. Additionally, the methodology provides input parameter weighting factors that have proved useful in failure analysis and root cause investigations as indicators of which of several upstream product parameters have the greater influence on the downstream failure outcomes. / Dissertation/Thesis / Ph.D. Electrical Engineering 2011
|
278 |
Méthodes de classifications dynamiques et incrémentales : application à la numérisation cognitive d'images de documents / Incremental and dynamic learning for document image : application for intelligent cognitive scanning of documentsNgo Ho, Anh Khoi 19 March 2015 (has links)
Cette thèse s’intéresse à la problématique de la classification dynamique en environnements stationnaires et non stationnaires, tolérante aux variations de quantités des données d’apprentissage et capable d’ajuster ses modèles selon la variabilité des données entrantes. Pour cela, nous proposons une solution faisant cohabiter des classificateurs one-class SVM indépendants ayant chacun leur propre procédure d’apprentissage incrémentale et par conséquent, ne subissant pas d’influences croisées pouvant émaner de la configuration des modèles des autres classificateurs. L’originalité de notre proposition repose sur l’exploitation des anciennes connaissances conservées dans les modèles de SVM (historique propre à chaque SVM représenté par l’ensemble des vecteurs supports trouvés) et leur combinaison avec les connaissances apportées par les nouvelles données au moment de leur arrivée. Le modèle de classification proposé (mOC-iSVM) sera exploité à travers trois variations exploitant chacune différemment l’historique des modèles. Notre contribution s’inscrit dans un état de l’art ne proposant pas à ce jour de solutions permettant de traiter à la fois la dérive de concepts, l’ajout ou la suppression de concepts, la fusion ou division de concepts, tout en offrant un cadre privilégié d’interactions avec l’utilisateur. Dans le cadre du projet ANR DIGIDOC, notre approche a été appliquée sur plusieurs scénarios de classification de flux d’images pouvant survenir dans des cas réels lors de campagnes de numérisation. Ces scénarios ont permis de valider une exploitation interactive de notre solution de classification incrémentale pour classifier des images arrivant en flux afin d’améliorer la qualité des images numérisées. / This research contributes to the field of dynamic learning and classification in case of stationary and non-stationary environments. The goal of this PhD is to define a new classification framework able to deal with very small learning dataset at the beginning of the process and with abilities to adjust itself according to the variability of the incoming data inside a stream. For that purpose, we propose a solution based on a combination of independent one-class SVM classifiers having each one their own incremental learning procedure. Consequently, each classifier is not sensitive to crossed influences which can emanate from the configuration of the models of the other classifiers. The originality of our proposal comes from the use of the former knowledge kept in the SVM models (represented by all the found support vectors) and its combination with the new data coming incrementally from the stream. The proposed classification model (mOC-iSVM) is exploited through three variations in the way of using the existing models at each step of time. Our contribution states in a state of the art where no solution is proposed today to handle at the same time, the concept drift, the addition or the deletion of concepts, the fusion or division of concepts while offering a privileged solution for interaction with the user. Inside the DIGIDOC project, our approach was applied to several scenarios of classification of images streams which can correspond to real cases in digitalization projects. These different scenarios allow validating an interactive exploitation of our solution of incremental classification to classify images coming in a stream in order to improve the quality of the digitized images.
|
279 |
On The Importance of Light Source Classification in Indoor Light Energy HarvestingZhang, Ye January 2018 (has links)
Indoor light energy harvesting plays an important role in field of renewable energy. Indoor lighting condition is usually described by level of illumination. However, measured data alone does not by classification of different light sources, result is not representative. Energy harvesting system needs to be evaluated after classification to obtain more accurate value. This is also importance of different light source classification. In this thesis, a complete set of indoor light energy harvesting system is introduced, two models are proposed to evaluate energy, robustness is improved by mixing complex light condition during data collection. Main task of this thesis is to verify importance of indoor light classification. Main contribution of this thesis is to fill a gap in energy evaluation, and built a model with superior performance. In terms of collecting data, this thesis researches influence factor of data collection to ensure reliability of accuracy. This work can more accurately collect spectral under different light conditions. Finally, light energy is evaluated by classification of indoor light. This model is proven to be closer to true energy value under real condition. The result shows that classified data is more accurate than direct calculation of energy,it has a smaller error. In addition, performance of classifier model used in this thesis has been proven to be excellent, classifier model can still carry on high-accuracy classification when measurement data are not included in training data set. This makes it a low-cost alternative to measuring light condition without spectrometer.
|
280 |
Cost-sensitive boosting : a unified approachNikolaou, Nikolaos January 2016 (has links)
In this thesis we provide a unifying framework for two decades of work in an area of Machine Learning known as cost-sensitive Boosting algorithms. This area is concerned with the fact that most real-world prediction problems are asymmetric, in the sense that different types of errors incur different costs. Adaptive Boosting (AdaBoost) is one of the most well-studied and utilised algorithms in the field of Machine Learning, with a rich theoretical depth as well as practical uptake across numerous industries. However, its inability to handle asymmetric tasks has been the subject of much criticism. As a result, numerous cost-sensitive modifications of the original algorithm have been proposed. Each of these has its own motivations, and its own claims to superiority. With a thorough analysis of the literature 1997-2016, we find 15 distinct cost-sensitive Boosting variants - discounting minor variations. We critique the literature using {\em four} powerful theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. From each framework, we derive a set of properties which must be obeyed by boosting algorithms. We find that only 3 of the published Adaboost variants are consistent with the rules of all the frameworks - and even they require their outputs to be calibrated to achieve this. Experiments on 18 datasets, across 21 degrees of cost asymmetry, all support the hypothesis - showing that once calibrated, the three variants perform equivalently, outperforming all others. Our final recommendation - based on theoretical soundness, simplicity, flexibility and performance - is to use the original Adaboost algorithm albeit with a shifted decision threshold and calibrated probability estimates. The conclusion is that novel cost-sensitive boosting algorithms are unnecessary if proper calibration is applied to the original.
|
Page generated in 0.0211 seconds