161 |
Application of Machine Learning Techniques for Real-time Classification of Sensor Array DataLi, Sichu 15 May 2009 (has links)
There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods.
|
162 |
A Study on Text Classification Methods and Text FeaturesDanielsson, Benjamin January 2019 (has links)
When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance
|
163 |
IMBALANCED HIGH DIMENSIONAL CLASSIFICATION AND APPLICATIONS IN PRECISION MEDICINEHui Sun (6630500) 14 May 2019 (has links)
<div>Classification is an important supervised learning technique with numerous applications. This dissertation addresses two research problems in this area. The first is multicategory classification methods for high dimensional data. To handle high dimension low sample size (HDLSS) data with uneven group sizes (i.e., imbalanced data), we develop a new classification method called angle-based multicategory distance-weighted support vector machine (MDWSVM). It is motivated from its binary counterpart and has the merits of both the support vector machine (SVM) and distance-weighted discrimination (DWD) methods while alleviating both the data piling issue of SVM and the imbalanced data issue of DWD. Theoretical results and numerical studies are used to demonstrate the advantages of our MDWSVM method over existing methods.</div><div><br></div><div>The second part of the dissertation is on the application of classification methods to precision medicine problems. Because one-stage precision medicine problems can be reformulated as weighted classification problems, the subtle differences between classification methods may lead to different application performances under this setting. Among the margin-based classification methods, we propose to use the distance weighted discrimination outcome weighted learning (DWD-OWL) method. We also extend the model to handle negative rewards for better generality and apply the angle-based idea to handle multiple treatments. The proofs of Fisher consistency for DWD-OWL in both the binary and multicategory cases are provided. Under mild conditions, the insensitivity of DWD-OWL for imbalanced setting is also demonstrated.</div>
|
164 |
System Designs for Diabetic Foot Ulcer Image AssessmentWang, Lei 07 March 2016 (has links)
For individuals with type 2 diabetes, diabetic foot ulcers represent a significant health issue and the wound care cost is quite high. Currently, clinicians and nurses mainly base their wound assessment on visual examination of wound size and the status of the wound tissue. This method is potentially inaccurate for wound assessment and requires extra clinical workload. In view of the prevalence of smartphones with high resolution digital camera, assessing wound healing by analyzing of real-time images using the significant computational power of today’s mobile devices is an attractive approach for managing foot ulcers. Alternatively, the smartphone may be used just for image capture and wireless transfer to a PC or laptop for image processing. To achieve accurate foot ulcer image assessment, we have developed and tested a novel automatic wound image analysis system which accomplishes the following conditions: 1) design of an easy-to-use image capture system which makes the image capture process comfortable for the patient and provides well-controlled image capture conditions; 2) synthesis of efficient and accurate algorithms for real-time wound boundary determination to measure the wound area size; 3) development of a quantitative method to assess the wound healing status based on a foot ulcer image sequence for a given patient and 4) design of a wound image assessment and management system that can be used both in the patient’s home and clinical environment in a tele-medicine fashion. In our work, the wound image is captured by the camera on the smartphone while the patient’s foot is held in place by an image capture box, which is specially design to aid patients in photographing ulcers occurring on the sole of their feet. The experimental results prove that our image capture system guarantees consistent illumination and a fixed distance between the foot and camera. These properties greatly reduce the complexity of the subsequent wound recognition and assessment. The most significant contribution of our work is the development of five different wound boundary determination approaches based on different computer vision algorithms. The first approach employs the level set algorithm to determine the wound boundary directly based on a manually set initial curve. The second and third approaches are the mean-shift segmentation based methods augmented by foot outline detection and analysis. These two approaches have been shown to be efficient to implement (especially on smartphones), prior-knowledge independent and able to provide reasonably accurate wound segmentation results given a set of well-tuned parameters. However, this method suffers from the lack of self-adaptivity due to the fact that it is not based on machine learning. Consequently, a two-stage Support Vector Machine (SVM) binary classifier based wound recognition approach is developed and implemented. This approach consists of three major steps 1) unsupervised super-pixel segmentation, 2) feature descriptor extraction for each super-pixel and 3) supervised classifier based wound boundary determination. The experimental results show that this approach provides promising performance (sensitivity: 73.3%, specificity: 95.6%) when dealing with foot ulcer images captured with our image capture box. In the third approach, we further relax the image capture constraints and generalize the application of our wound recognition system by applying the conditional random field (CRF) based model to solve the wound boundary determination. The key modules in this approach are the TextonBoost based potential learning at different scales and efficient CRF model inference to find the optimal labeling. Finally, the standard K-means clustering algorithm is applied to the determined wound area for color based wound tissue classification. To train the models used in the last two approaches, as well as to evaluate all three methods, we have collected about 100 wound images at the wound clinic in UMass Medical School by tracking 15 patients for a 2-year period, following an IRB approved protocol. The wound recognition results were compared with the ground truth generated by combining clinical labeling from three experienced clinicians. Specificity and sensitivity based measures indicate that the CRF based approach is the most reliable method despite its implementation complexity and computational demands. In addition, sample images of Moulage wound simulations are also used to increase the evaluation flexibility. The advantages and disadvantages of three approaches are described. Another important contribution of this work has been development of a healing score based mechanism for quantitative wound healing status assessment. The wound size and color composition measurements were converted to a score number ranging from 0-10, which indicates the healing trend based on comparisons of subsequent images to an initial foot ulcer image. By comparing the result of the healing score algorithm to the healing scores determined by experienced clinicians, we assess the clinical validity of our healing score algorithm. The level of agreement of our healing score with the three assessing clinicians was quantified by using the Kripendorff’s Alpha Coefficient (KAC). Finally, a collaborative wound image management system between the PC and smartphone was designed and successfully applied in the wound clinic for patients’ wound tracking purpose. This system is proven to be applicable in clinical environment and capable of providing interactive foot ulcer care in a telemedicine fashion.
|
165 |
Modelo híbrido de avaliação de risco de crédito para corporações brasileiras com base em algoritmos de aprendizado de máquinaGregório, Rafael Leite 09 July 2018 (has links)
Submitted by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:03Z
No. of bitstreams: 1
RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) / Approved for entry into archive by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:24Z (GMT) No. of bitstreams: 1
RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) / Made available in DSpace on 2018-08-08T13:33:24Z (GMT). No. of bitstreams: 1
RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5)
Previous issue date: 2018-07-09 / The credit risk assessment has a relevant role for financial institutions because it is associated with possible losses and has a large impact on the balance sheets. Although there are several researches on applications of machine learning and finance models, a study is still lacking that integrates available knowledge about credit risk assessment. This paper aims at specifying the machine learning model of the probability of default of publicly traded companies present in the Bovespa Index (corporations) and, based on the estimations of the model, to obtain risk assessment metrics based on risk letters. We converged methodologies verified in the literature and we estimated models that comprise fundamentalist (balance sheet) and governance data, macroeconomic and even variables resulting from the application of the proprietary model of KMV credit risk assessment. We test the XGboost and LinearSVM algorithms, which have very different characteristics among them, but are potentially useful to the problem. Parameter Grids were performed to identify the most representative variables and to specify the best performing model. The model selected was XGboost, and performance was very similar to the results obtained for the North American stock market in analogous research. The estimated credit ratings suggest that they are more sensitive to the economic and financial situation of the companies than that verified by traditional Rating Agencies. / A avaliação do risco de crédito tem papel relevante para as instituições financeiras por estar associada a possíveis perdas que podem gerar grande impacto nos balanços. Embora existam várias pesquisas sobre aplicações de modelos de aprendizado de máquina e finanças, ainda não há estudo que integre o conhecimento disponível sobre avaliação de risco de crédito. Este trabalho visa especificar modelo de aprendizado de máquina da probabilidade de descumprimento de empresas de capital aberto presentes no Índice Bovespa (corporações) e, fruto das estimações do modelo, obter métrica de avaliação de risco baseada em letras (ratings) de risco. Convergiu-se metodologias verificadas na literatura e estimou-se modelos que compreendem componentes fundamentalistas (de balanço) e de governança corporativa, macroeconômicos e ainda variáveis produto da aplicação do modelo proprietário de avaliação de risco de crédito KMV. Testou-se os algoritmos XGboost e LinearSVM, os quais possuem características bastante distintas entre si, mas são potencialmente úteis ao problema exposto. Foram realizados Grids de parâmetros para identificação das variáveis mais representativas e para a especificação do modelo com melhor desempenho. O modelo selecionado foi o XGboost, tendo sido observado desempenho bastante semelhante aos resultados obtidos para o mercado de ações norte-americano em pesquisa análoga. Os ratings de crédito estimados mostram-se mais sensíveis à situação econômico-financeira das empresas ante o verificado por agências de rating tradicionais.
|
166 |
Classificação fonética utilizando Boosting e SVMTEIXEIRA JÚNIOR, Talisman Cláudio de Queiroz 17 February 2006 (has links)
Submitted by Irvana Coutinho (irvana@ufpa.br) on 2012-03-07T12:35:04Z
No. of bitstreams: 2
Dissertacao_Talisman_Teixeira_Junior ClassificacaoFoneticaBoosting.pdf: 1955727 bytes, checksum: 2174e57105a6d0135a85cb9c47e05a7a (MD5)
license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5) / Approved for entry into archive by Irvana Coutinho(irvana@ufpa.br) on 2012-03-07T12:40:11Z (GMT) No. of bitstreams: 2
Dissertacao_Talisman_Teixeira_Junior ClassificacaoFoneticaBoosting.pdf: 1955727 bytes, checksum: 2174e57105a6d0135a85cb9c47e05a7a (MD5)
license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5) / Made available in DSpace on 2012-03-07T12:40:11Z (GMT). No. of bitstreams: 2
Dissertacao_Talisman_Teixeira_Junior ClassificacaoFoneticaBoosting.pdf: 1955727 bytes, checksum: 2174e57105a6d0135a85cb9c47e05a7a (MD5)
license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5)
Previous issue date: 2006 / Para compor um sistema de Reconhecimento Automático de Voz, pode ser utilizada uma tarefa chamada Classificação Fonética, onde a partir de uma amostra de voz decide-se qual fonema foi emitido por um interlocutor. Para facilitar a classificação e realçar as características mais marcantes dos fonemas, normalmente, as amostras de voz são pré- processadas através de um fronl-en'L Um fron:-end, geralmente, extrai um conjunto de parâmetros para cada amostra de voz. Após este processamento, estes parâmetros são insendos em um algoritmo classificador que (já devidamente treinado) procurará decidir qual o fonema emitido. Existe uma tendência de que quanto maior a quantidade de parâmetros utilizados no sistema, melhor será a taxa de acertos na classificação. A contrapartida para esta tendência é o maior custo computacional envolvido. A técnica de Seleção de Parâmetros tem como função mostrar quais os parâmetros mais relevantes (ou mais utilizados) em uma tarefa de classificação, possibilitando, assim, descobrir quais os parâmetros redundantes, que trazem pouca (ou nenhuma) contribuição à tarefa de classificação. A proposta deste trabalho é aplicar o classificador SVM à classificação fonética, utilizando a base de dados TIMIT, e descobrir os parâmetros mais relevantes na classificação, aplicando a técnica Boosting de Seleção de Parâmetros. / With the aim of setting up a Automatic Speech Recognition (ASR) system, a task named Phonetic Classification can be used. That task consists in, from a speech sample, deciding which phoneme was pronounced by a speaker. To ease the classification task and to enhance the most marked characteristics of the phonemes, the speech samples are usually pre-processed by a front-end. A front-end, as a general rule, extracts a set of features to each speech sample. After that, these features are inserted in a classification algorithm, that (already properly trained) will try to decide which phoneme was pronounced. There is a rule of thumb which says that the more features the system uses, the smaller the classification error rate will be. The disadvantage to that is the larger computational cost. Feature Selection task aims to show which are the most relevant (or more used) features in a classification task. Therefore, it is possible to discover which are the redundant features, that make little (or no) contribution to the classification task. The aim of this work is to apply SVM classificator in Phonetic Classification task, using TIMIT database, and discover the most relevant features in this classification using Boosting approach to implement Feature Selection.
|
167 |
Automated Localization and Segmentation of Pelvic Floor Structures on MRI to Predict Pelvic Organ ProlapseOnal, Sinan 29 May 2014 (has links)
Pelvic organ prolapse (POP) is a major health problem that affects women. POP is a herniation of the female pelvic floor organs (bladder, uterus, small bowel, and rectum) into the vagina. This condition can cause significant problems such as urinary and fecal incontinence, bothersome vaginal bulge, incomplete bowel and bladder emptying, and pain/discomfort. POP is normally diagnosed through clinical examination since there are few associated symptoms. However, clinical examination has been found to be inadequate and in disagreement with surgical findings. This makes POP a common but poorly understood condition. Dynamic magnetic resonance imaging (MRI) of the pelvic floor has become an increasingly popular tool to assess POP cases that may not be evident on clinical examination. Anatomical landmarks are manually identified on MRI along the midsagittal plane to determine reference lines and measurements for grading POP. However, the manual identification of these points, lines and measurements on MRI is a time-consuming and subjective procedure. This has restricted the correlation analysis of MRI measurements with clinical outcomes to improve the diagnosis of POP and predict the risk of development of this disorder.
The main goal of this research is to improve the diagnosis of pelvic organ prolapse through a model that automatically extracts image-based features from patient specific MRI and fuses them with clinical outcomes. To extract image-based features, anatomical landmarks need to be identified on MRI through the localization and segmentation of pelvic bone structures. This is the main challenge of current algorithms, which tend to fail during bone localization and segmentation on MRI. The proposed research consists of three major objectives: (1) to automatically identify pelvic floor structures on MRI using a multivariate linear regression model with global information, (2) to identify image-based features using a hybrid technique based on texture-based block classification and K-means clustering analysis to improve the segmentation of bone structures on images with low contrast and image in homogeneity, (3) to design, test and validate a prediction model using support vector machines with correlation analysis based feature selection to improve disease diagnosis.
The proposed model will enable faster and more consistent automated extraction of features from images with low contrast and high inhomogeneity. This is expected to allow studies on large databases to improve the correlation analysis between MRI features and clinical outcomes. The proposed research focuses on the pelvic region but the techniques are applicable to other anatomical regions that require automated localization and segmentation of multiple structures from images with high inhomogeneity, low contrast, and noise. This research can also be applicable to the automated extraction and analysis of image-based features for the diagnosis of other diseases where clinical examination is not adequate. The proposed model will set the foundation towards a computer-aided decision support system that will enable the fusion of image, clinical, and patient data to improve the diagnosis of POP through personalized assessment. Automating the process of pelvic floor measurements on radiologic studies will allow the use of imaging to predict the development of POP in predisposed patients, and possibly lead to preventive strategies.
|
168 |
Etude de techniques de classement "Machines à vecteurs supports" pour la vérification automatique du locuteurKharroubi, Jamal 07 1900 (has links) (PDF)
Les SVM (Support Vector Machines) sont de nouvelles techniques d'apprentissage statistique proposées par V.Vapnik en 1995. Elles permettent d'aborder des problèmes très divers comme le classement, la régression, la fusion, etc... Depuis leur introduction dans le domaine de la Reconnaissance de Formes (RdF), plusieurs travaux ont pu montrer l'efficacité de ces techniques principalement en traitement d'image. L'idée essentielle des SVM consiste à projeter les données de l'espace d'entrée (appartenant à deux classes différentes) non-linéairement séparables dans un espace de plus grande dimension appelé espace de caractéristiques de façon à ce que les données deviennent linéairement séparables. Dans cet espace, la technique de construction de l'hyperplan optimal est utilisée pour calculer la fonction de classement séparant les deux classes. Dans ce travail de thèse, nous avons étudié les SVM comme techniques de classement pour la Vérification Automatique du Locuteur (VAL) en mode dépendant et indépendant du texte. Nous avons également étudié les SVM pour des tâches de fusion en réalisant des expériences concernant deux types de fusion, la fusion de méthodes et la fusion de modes. Dans le cadre du projet PICASSO, nous avons proposé un système de VAL en mode dépendant du texte utilisant les SVM dans une application de mots de passe publics. Dans ce système, une nouvelle modélisation basée sur la transcription phonétique des mots de passe a été proposée pour construire les vecteurs d'entrée pour notre classifieur SVM. En ce qui concerne notre étude des SVM en VAL en mode indépendant du texte, nous avons proposé des systèmes hybrides GMM-SVM. Dans ces systèmes, trois nouvelles représentations de données ont été proposées permettant de réunir l'efficacité des GMM en modélisation et les performances des SVM en décision. Ce travail entre dans le cadre de nos participations aux évaluations internationales NIST. Dans le cadre du projet BIOMET sur l'authentification biométrique mené par le GET (Groupe des Écoles de Télécommunications), nous avons étudié les SVM pour deux tâches de fusion. La première concerne la fusion de méthodes où nous avons fusionné les scores obtenus par les participants à la tâche ``One Speaker Detection'' aux évaluations NIST'2001. La seconde concerne la fusion de modes menée sur les scores obtenus sur les quatre différentes modalités de la base de données M2VTS. Les études que nous avons réalisées représentent une des premières tentatives d'appliquer les SVM dans le domaine de la VAL. Les résultats obtenus montrent que les SVM sont des techniques très efficaces et surtout très prometteuses que ce soit pour le classement ou la fusion.
|
169 |
Fusion de données hétérogènes pour la perception de l'homme par un robot mobileGerma, Thierry 24 September 2010 (has links) (PDF)
Ces travaux de thèse s'inscrivent dans le cadre du projet européen CommRob impliquant des partenaires académiques et industriels. Le but du projet est la conception d'un robot compagnon évoluant en milieu structuré, dynamique et fortement encombré par la présence d'autres agents partageant l'espace (autres robots, humains). Dans ce cadre, notre contribution porte plus spécifiquement sur la perception multimodale des usagers du robot (utilisateur et passants). La perception multimodale porte sur le développement et l'intégration de fonctions perceptuelles pour la détection, l'identification de personnes et l'analyse spatio-temporelle de leurs déplacements afin de communiquer avec le robot. La détection proximale des usagers du robot s'appuie sur une perception multimodale couplant des données hétérogènes issues de différents capteurs. Les humains détectés puis reconnus sont alors suivis dans le flot vidéo délivré par une caméra embarquée afin d'en interpréter leurs déplacements. Une première contribution réside dans la mise en place de fonctions de détection et d'identification de personnes depuis un robot mobile. Une deuxième contribution concerne l'analyse spatio-temporelle de ces percepts pour le suivi de l'utilisateur dans un premier temps, de l'ensemble des personnes situées aux alentours du robot dans un deuxième temps. Enfin, dans le sens des exigences de la robotique, la thèse comporte deux volets : un volet formel et algorithmique qui tire pertinence et validation d'un fort volet expérimental et intégratif. Ces développements s'appuient sur notre plateforme Rackham et celle mise en oeuvre durant le projet CommRob.
|
170 |
Estimation et Classification des Signaux AltimétriquesSeverini, Jerome, Mailhes, Corinne, Tourneret, Jean-Yves 07 October 2010 (has links) (PDF)
La mesure de la hauteur des océans, des vents de surface (fortement liés aux températures des océans), ou encore de la hauteur des vagues sont un ensemble de paramètres nécessaires à l'étude des océans mais aussi au suivi de leurs évolutions : l'altimétrie spatiale est l'une des disciplines le permettant. Une forme d'onde altimétrique est le résultat de l'émission d'une onde radar haute fréquence sur une surface donnée (classiquement océanique) et de la mesure de la réflexion de cette onde. Il existe actuellement une méthode d'estimation non optimale des formes d'onde altimétriques ainsi que des outils de classifications permettant d'identifier les différents types de surfaces observées. Nous proposons dans cette étude d'appliquer la méthode d'estimation bayésienne aux formes d'onde altimétriques ainsi que de nouvelles approches de classification. Nous proposons enfin la mise en place d'un algorithme spécifique permettant l'étude de la topographie en milieu côtier, étude qui est actuellement très peu développée dans le domaine de l'altimétrie.
|
Page generated in 0.0437 seconds