Global ETD Search

131	Estimação dos parâmetros do kernel em um classificador SVM na classificação de imagens hiperespectrais em uma abordagem multiclasse Bonesso, Diego January 2013 (has links) Nessa dissertação é investigada e testada uma metodologia para otimizar os parâmetros do kernel do classificador Support Vector Machines (SVM). Experimentos são realizados utilizando dados de imagens em alta dimensão. Imagens em alta dimensão abrem novas possibilidades para a classificação de imagens de sensoriamento remoto que capturam cenas naturais. É sabido que classes que são espectralmente muito similares, i.e, classes que possuem vetores de média muito próximos podem não obstante serem separadas com alto grau de acurácia em espaço de alta dimensão, desde que a matriz de covariância apresente diferenças significativas. O uso de dados de imagens em alta dimensão pode apresentar, no entanto, alguns desafios metodológicos quando aplicado um classificador paramétrico como o classificador de Máxima Verossimilhança Gaussiana. Conforme aumenta a dimensionalidade dos dados, o número de parâmetros a serem estimados a partir de um número geralmente limitado de amostras de treinamento também aumenta. Esse fato pode ocasionar estimativas pouco confiáveis, que por sua vez resultam em baixa acurácia na imagem classificada. Existem diversos abordagens propostas na literatura para minimizar esse problema. Os classificadores não paramétricos podem ser uma boa alternativa para mitigar esse problema. O SVM atualmente tem sido investigado na classificação de dados de imagens em alta-dimensão com número limitado de amostras de treinamento. Para que o classificador SVM seja utilizado com sucesso é necessário escolher uma função de kernel adequada, bem como os parâmetros dessa função. O kernel RBF tem sido frequentemente mencionado na literatura por obter bons resultados na classificação de imagens de sensoriamento remoto. Neste caso, dois parâmetro devem ser escolhidos para o classificador SVM: (1) O parâmetro de margem (C) que determina um ponto de equilíbrio razoável entre a maximização da margem e a minimização do erro de classificação, e (2) o parâmetro que controla o raio do kernel RBF. Estes dois parâmetros podem ser vistos como definindo um espaço de busca. O problema nesse caso consiste em procurar o ponto ótimo que maximize a acurácia do classificador SVM. O método de Busca em Grade é baseado na exploração exaustiva deste espaço de busca. Esse método é proibitivo do ponto de vista do tempo de processamento, sendo utilizado apenas com propósitos comparativos. Na prática os métodos heurísticos são a abordagem mais utilizada, proporcionado níveis aceitáveis de acurácia e tempo de processamento. Na literatura diversos métodos heurísticos são aplicados ao problema de classificação de forma global, i.e, os valores selecionados são aplicados durante todo processo de classificação. Esse processo, no entanto, não considera a diversidade das classes presentes nos dados. Nessa dissertação investigamos a aplicação da heurística Simulated Annealing (Recozimento Simulado) para um problema de múltiplas classes usando o classificador SVM estruturado como uma arvore binária. Seguindo essa abordagem, os parâmetros são estimados em cada nó da arvore binária, resultado em uma melhora na acurácia e tempo razoável de processamento. Experimentos são realizados utilizando dados de uma imagem hiperespectral disponível, cobrindo uma área de teste com controle terrestre bastante confiável. / In this dissertation we investigate and test a methodology to optimize the kernel parameters in a Support Vector Machines classifier. Experiments were carried out using remote sensing high-dimensional image data. High dimensional image data opens new possibilities in the classification of remote sensing image data covering natural scenes. It is well known that classes that are spectrally very similar, i.e., classes that show very similar mean vectors can notwithstanding be separated with an high degree of accuracy in high dimensional spaces, provided that their covariance matrices differ significantly. The use of high-dimensional image data may present, however, some drawbacks when applied in parametric classifiers such as the Gaussian Maximum Likelihood classifier. As the data dimensionality increases, so does the number of parameters to be estimated from a generally limited number of training samples. This fact results in unreliable estimates for the parameters, which in turn results in low accuracy in the classified image. There are several approaches proposed in the literature to minimize this problem. Non-parametric classifiers may provide a sensible way to overcome this problem. Support Vector Machines (SVM) have been more recently investigated in the classification of high-dimensional image data with a limited number of training samples. To achieve this end, a proper kernel function has to be implemented in the SVM classifier and the respective parameters selected properly. The RBF kernel has been frequently mentioned in the literature as providing good results in the classification of remotely sensed data. In this case, two parameters must be chosen in the SVM classification: (1) the margin parameter (C) that determines the trade-off between the maximization of the margin in the SVM and minimization of the classification error, and (2) the parameter that controls the radius in the RBF kernel. These two parameters can be seen as defining a search space, The problem here consists in finding an optimal point that maximizes the accuracy in the SVM classifier. The Grid Search approach is based on an exhaustive exploration in the search space. This approach results prohibitively time consuming and is used only for comparative purposes. In practice heuristic methods are the most commonly used approaches, providing acceptable levels of accuracy and computing time. In the literature several heuristic methods are applied to the classification problem in a global fashion, i.e., the selected values are applied to the entire classification process. This procedure, however, does not take into consideration the diversity of the classes present in the data. In this dissertation we investigate the application of Simulated Annealing to a multiclass problem using the SVM classifier structured as a binary tree. Following this proposed approach, the parameters are estimated at every level of the binary tree, resulting in better accuracy and a reasonable computing time. Experiments are done using a set of hyperspectral image data, covering a test area with very reliable ground control available. Sensoriamento remoto Imagens hiperespectrais Support vector machines Simulated annealing Hyperspectral image data
132	Identifying Kinship Cues from Facial Images VIEIRA, Tiago Figueiredo 08 November 2013 (has links) Submitted by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-04-17T13:23:49Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE Tiago Figueiredo Vieira.compressed.pdf: 2116364 bytes, checksum: b3851944ff7105bff9fdcd050d5d4f86 (MD5) / Made available in DSpace on 2015-04-17T13:23:49Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE Tiago Figueiredo Vieira.compressed.pdf: 2116364 bytes, checksum: b3851944ff7105bff9fdcd050d5d4f86 (MD5) Previous issue date: 2013-11-08 / A investigação da face humana é comum em análise de padrões/ processamento de imagens. Abordagens tradicionais são a identificação e a verificação mas muitas outras estão surgindo, como estimativa de idade, análise de similaridade, atratividade e o reconhecimento de parentesco. Apesar deste último possuir diversas possíveis aplicações, poucos trabalhos foram apresentados até então. Esta tese apresenta um algoritmo apto a discriminar entre irmãos e não irmãos, baseado nas imagens das suas faces. Um grande desafio foi lidar com a falta de um benchmark em análise de parentesco e, por esta razão, uma base de imagens de alta qualidade de pares de irmãos foi coletada. Isto é uma contribuição relevante à comunidade científica e foi particularmente útil para evitar possíveis problemas devido a imagens de baixa qualidade e condições não-controladas de aquisição de bases de dados heterogêneas usadas em outros trabalhos. Baseado nessas imagens, vários classificadores foram construídos usando técnicas baseadas na extração de características e holística para investigar quais variáveis são mais eficientes para distinguir parentes. As características foram primeiramente testadas individualmente e então as informações mais significantes da face foram fornecidas a um algoritmo único. O classificador de irmãos superou a performance de humanos que avaliaram a mesma base de dados. Adicionalmente, a boa capacidade de distinção do algorimo foi testado aplicando-o a uma base de dados de baixa qualidade coletada da Internet. O conhecimento obtido da análise de irmãos levou ao desenvolvimento de um algoritmo similar capaz de distinguir pares pai-filho de indivíduos não relacionados. Os resultados obtidos possuem impactos na recuperação e anotação automática de bases de dados, ciência forense, pesquisa genealógica e na busca de familiares perdidos.----------------------------------------------------------------------------------------------- The investigation of human face images is ubiquitous in pattern analysis/ image processing research. Traditional approaches are related to face identification and verification but, several other areas are emerging, like age/ expression estimation, analysis of facial similarity and attractiveness and automatic kinship recognition. Despite the fact that the latter could have applications in fields such as image retrieval and annotation, little work in this area has been presented so far. This thesis presents an algorithm able to discriminate between siblings and unrelated individuals, based on their face images. In this context, a great challenge was to deal with the lack of a benchmark in kinship analysis, and for this reason, a high-quality dataset of images of siblings’ pairs was collected. This is a relevant contribution to the research community and is particularly useful to avoid potential problems due to low quality pictures and uncontrolled imaging conditions of heterogeneous datasets used in previous researches. The database includes frontal, profile, expressionless and smiling faces of siblings pairs. Based on these images, various classifiers were constructed using feature-based and holistic techniques to investigate which data are more effective for discriminating siblings from non-siblings. The features were first tested individually and then the most significant face data were supplied to a unique algorithm. The siblings classifier has been found to outperform human raters on all datasets. Also, the good discrimination capabilities of the algorithm is tested by applying the classifiers to a low quality database of images collected from the Internet in a cross-database experiment. The knowledge acquired from the analysis of siblings fostered a similar algorithm able to discriminating parent-child pairs from unrelated individuals. The results obtained in this thesis have impact in image retrieval and annotation, forensics, genealogical research and finding missing family members. Kinship Verification Support Vector Machines Feature Selection Verificação de Parentesco Máquinas de Vetores de Suporte Seleção de Características
133	MaltParser -- An Architecture for Inductive Labeled Dependency Parsing Hall, Johan January 2006 (has links) This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM). The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English. / Denna licentiatavhandling presenterar en mjukvaruarkitektur för datadriven dependensparsning, dvs. för att automatiskt skapa en syntaktisk analys i form av dependensgrafer för meningar i texter på naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektor-maskiner (SVM). MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVM-gränssnittet. Experimenten visar att det är möjligt att reducera inlärnings- och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodell har genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats. Dependency Parsing Support Vector Machines Machine Learning
134	Predicting movie ratings : A comparative study on random forests and support vector machines Persson, Karl January 2015 (has links) The aim of this work is to evaluate the prediction performance of random forests in comparison to support vector machines, for predicting the numerical user ratings of a movie using pre-release attributes such as its cast, directors, budget and movie genres. In order to answer this question an experiment was conducted on predicting the overall user rating of 3376 hollywood movies, using data from the well established movie database IMDb. The prediction performance of the two algorithms was assessed and compared over three commonly used performance and error metrics, as well as evaluated by the means of significance testing in order to further investigate whether or not any significant differences could be identified. The results indicate some differences between the two algorithms, with consistently better performance from random forests in comparison to support vector machines over all of the performance metrics, as well as significantly better results for two out of three metrics. Although a slight difference has been indicated by the results one should also note that both algorithms show great similarities in terms of their prediction performance, making it hard to draw any general conclusions on which algorithm yield the most accurate movie predictions. data mining machine learning regression movie prediction random forests support vector machines Computer Sciences Datavetenskap (datalogi)
135	Study on a resource-saving cloud based long-term ECG monitoring system using machine learning algorithms Cheng, Ping 19 April 2018 (has links) Electrocardiogram (ECG) records the electrical impulses from myocardium, reflects the underlying dynamics of the heart and has been widely exploited to detect and identify cardiac arrhythmias. This dissertation examines a resource-saving cloud based long-term ECG (CLT-ECG) monitoring system which consists of an ECG raw data acquisition system, a mobile device and a serve. Three issues that are critically pertaining to the effectiveness and efficiency of the monitoring system are studied: the detection of life-threatening arrhythmias, the discrimination of normal and abnormal heartbeats to facilitate the resource-saving operation and the multi-class heartbeat classification algorithm for non-life-threatening arrhythmias. The detection algorithm for life-threatening ventricular arrhythmias, which is critical to saving patients’ lives, is investigated by exploiting personalized features. Two new personalized features, namely, aveCC and medianCC, are extracted based on the correlation coefficients between a patient-specific regular QRS-complex template and his/her real-time ECG data, characterizing subtle differences in the QRS complexes among different people. A small set of the most effective features is selected for efficient performance and real-time operation using Support Vector Machines (SVMs). The effectiveness of the proposed algorithm is validated in enhancing the performance under both the record-based and database-based data divisions. The classification algorithm achieves results outperforming the existing classification performances using top-two or top-three features. A novel patient-specific arrhythmia detection algorithm, which discriminates the normal and abnormal heartbeats, is proposed using One-Class SVMs. Conventionally, CLT-ECG systems are used to solve problems such as the portable problem and the difficulty of capturing the intermittent arrhythmias. However, CLT-ECG systems are subject to several practical limitations: battery power restriction, network congestion and heavily redundant ECG data. To overcome these problems, a resource-saving CLT-ECG system is studied, in which a novel arrhythmia detection algorithm closely related to the resource-saving rate is proposed and examined in detail. The proposed arrhythmia detection algorithm explores two types of variations: waveform change indicator (WCI), which reflects a change within one heartbeat; modified RR interval ratio (modRRIR), which characterizes the successive heartbeat interval variation. The overall classification result is obtained from combining the results separately adopting WCI and modRRIR. The proposed algorithm is validated using the public ECG database with a result outperforming others in the literature, as well as using the data collected from the ECG platform HeartCarer built in our research group. Considering the multi-class classification in the cloud server, a patient-specific single-lead ECG heartbeat classification strategy is proposed to discriminate ventricular ectopic beats (VEBs) and Supraventricular Ectopic Beats (SVEBs). Two types of features are extracted: Intra-beat features characterize the distortion of the waveform within one heartbeat, while inter-beat features reflect the variation between successive heartbeats. A novel fusion strategy consisting of a global classifier and a local classifier is presented. The local classifier is obtained using the high-confidence heartbeats extracted from the first 5-minute data of a specific patient, while the global classifier is trained by the public training data. The advantage of the developed strategy is that fully automatic classification is realized without the intervention of physicians. Finally, simulation results show that comparable or even better classification performance is achieved, which validates the effectiveness of the proposed strategy. / Graduate / 2019-03-19 Electrocardiogram (ECG) Automatic ECG Classification Machine Learning Support Vector Machines Resource-saving Cloud based
136	kernlab - An S4 Package for Kernel Methods in R Karatzoglou, Alexandros, Smola, Alex, Hornik, Kurt, Zeileis, Achim 11 1900 (has links) (PDF) kernlab is an extensible package for kernel-based machine learning methods in R. It takes advantage of R's new S4 object model and provides a framework for creating and using kernel-based algorithms. The package contains dot product primitives (kernels), implementations of support vector machines and the relevance vector machine, Gaussian processes, a ranking algorithm, kernel PCA, kernel CCA, and a spectral clustering algorithm. Moreover it provides a general purpose quadratic programming solver, and an incomplete Cholesky decomposition method.
137	Parallel Evaluation of Numerical Models for Algorithmic Trading / Parallel Evaluation of Numerical Models for Algorithmic Trading Ligr, David January 2016 (has links) This thesis will address the problem of the parallel evaluation of algorithmic trading models based on multiple kernel support vector regression. Various approaches to parallelization of the evaluation of these models will be proposed and their suitability for highly parallel architectures, namely the Intel Xeon Phi coprocessor, will be analysed considering specifics of this coprocessor and also specifics of its programming. Based on this analysis a prototype will be implemented, and its performance will be compared to a serial and multi-core baseline pursuant to executed experiments. Powered by TCPDF (www.tcpdf.org)
138	Computational prediction of host-pathogen protein-protein interactions Ahmed, Ibrahim H.I. January 2017 (has links) Philosophiae Doctor - PhD / Supervised machine learning approaches have been applied successfully to the prediction of protein-protein interactions (PPIs) within a single organism, i.e., intra-species predictions. However, because of the absence of large amounts of experimentally validated PPIs data for training and testing, fewer studies have successfully applied these techniques to host-pathogen PPI, i.e., inter-species comparisons. Among the host-pathogen studies, most of them have focused on human-virus interactions and specifically human-HIV PPI data. Additional improvements to machine learning techniques and feature sets are important to improve the classification accuracy for host-pathogen protein-protein interactions prediction. The primary aim of this bioinformatics thesis was to develop a binary classifier with an appropriate feature set for host-pathogen protein-protein interaction prediction using published human-Hepatitis C virus PPI, and to test the model on available host-pathogen data for human-Bacillus anthracis PPI. Twelve different feature sets were compared to find the optimal set. The feature selection process reveals that our novel quadruple feature (a subsequence of four consecutive amino acid) combined with sequence similarity and human interactome network properties (such as degree, cluster coefficient, and betweenness centrality) were the best set. The optimal feature set outperformed those in the relevant published material, giving 95.9% sensitivity, 91.6% specificity and 89.0% accuracy. Using our optimal features set, we developed a neural network model to predict PPI between human-Mycobacterium tuberculosis. The strategy is to develop a model trained with intra-species PPI data and extend it to inter-species prediction. However, the lack of experimentally validated PPI data between human-Mycobacterium tuberculosis (Mtuberculosis), leads us to first assess the feasibility of using validated intra-species PPI data to build a model for inter-species PPI. In this model we used human intra-species PPI combined with Bacillus anthracis intra-species data to develop a binary classification model and extend the model for human-Bacillus anthracis inter-species prediction. Thus, we test our hypotheses on known human-Bacillus anthracis PPI data and the result shows good performance with 89.0% as average accuracy. The same approach was extended to the prediction of PPI between human-Mycobacterium tuberculosis. The predicted human-M-tuberculosis PPI data were further validated using functional enrichment of experimentally verified secretory proteins in M-tuberculosis, cellular compartment analysis and pathway enrichment analysis. Results show that five of the M-tuberculosis secretory proteins within an infected host macrophage that correspond to the mycobacterial virulent strain H37Rv were extracted from the human-M- tuberculosis PPI dataset predicted by our model. Finally, a web server was created to predict PPIs between human and Mycobacterium tuberculosis which is available online at URL:http://hppredict.sanbi.ac.za. In summary, the concepts, techniques and technologies developed as part of this thesis have the potential to contribute not only to the understanding PPI analysis between human and Mycobacterium tuberculosis, but can be extended to other pathogens. Further materials related to this study are available at ftp://ftp.sanbi.ac.za/machine learning. / National Research Foundation (NRF) and SANBI Mycobacterium tuberculosis Bacillus anthracis Protein-protein interactions Machine learning Support vector machines
139	Automatic Pain Assessment from Infants’ Crying Sounds Pai, Chih-Yun 01 November 2016 (has links) Crying is infants utilize to express their emotional state. It provides the parents and the nurses a criterion to understand infants’ physiology state. Many researchers have analyzed infants’ crying sounds to diagnose specific diseases or define the reasons for crying. This thesis presents an automatic crying level assessment system to classify infants’ crying sounds that have been recorded under realistic conditions in the Neonatal Intensive Care Unit (NICU) as whimpering or vigorous crying. To analyze the crying signal, Welch’s method and Linear Predictive Coding (LPC) are used to extract spectral features; the average and the standard deviation of the frequency signal and the maximum power spectral density are the other spectral features which are used in classification. For classification, three state-of-the-art classifiers, namely K-nearest Neighbors, Random Forests, and Least Squares Support Vector Machine are tested in this work, and the experimental result achieves the highest accuracy in classifying whimper and vigorous crying using the clean dataset is 90%, which is sampled with 10 seconds before scoring and 5 seconds after scoring and uses K-nearest neighbors as the classifier. Whimpering Vigorous Crying K-Nearest Neighbors Random Forests Least Squares Support Vector Machines Computer Sciences
140	Multispectral Image Analysis for Object Recognition and Classification Viau, Claude January 2016 (has links) Computer and machine vision applications are used in numerous fields to analyze static and dynamic imagery in order to assist or automate some form of decision-making process. Advancements in sensor technologies now make it possible to capture and visualize imagery at various wavelengths (or bands) of the electromagnetic spectrum. Multispectral imaging has countless applications in various field including (but not limited to) security, defense, space, medical, manufacturing and archeology. The development of advanced algorithms to process and extract salient information from the imagery is a critical component of the overall system performance. The fundamental objectives of this research project were to investigate the benefits of combining imagery from the visual and thermal bands of the electromagnetic spectrum to improve the recognition rates and accuracy of commonly found objects in an office setting. The goal was not to find a new way to “fuse” the visual and thermal images together but rather establish a methodology to extract multispectral descriptors in order to improve a machine vision system’s ability to recognize specific classes of objects.A multispectral dataset (visual and thermal) was captured and features from the visual and thermal images were extracted and used to train support vector machine (SVM) classifiers. The SVM’s class prediction ability was evaluated separately on the visual, thermal and multispectral testing datasets. Commonly used performance metrics were applied to assess the sensitivity, specificity and accuracy of each classifier. The research demonstrated that the highest recognition rate was achieved by an expert system (multiple classifiers) that combined the expertise of the visual-only classifier, the thermal-only classifier and the combined visual-thermal classifier. multispectral object recognition visual thermal imagery support vector machines (SVM) classification

Search results