Global ETD Search

391	Um método para seleção de atributos em dados genômicos Oliveira, Fabrízzio Condé de 26 November 2015 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-05-05T18:05:07Z No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-06-07T15:41:26Z (GMT) No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) / Made available in DSpace on 2016-06-07T15:41:26Z (GMT). No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) Previous issue date: 2015-11-26 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Estudos de associação em escala genômica buscam encontrar marcadores moleculares do tipo SNP que estão associados direta ou indiretamente a um fenótipo em questão tais como, uma ou mais características do indivíduo ou, até mesmo, uma doença. O SNP pode ser a própria mutação causal ou pode estar correlacionado com a mesma por serem herdados juntos. Para identi car a região causadora ou promotora do fenótipo, a qual não é conhecida a priori, milhares ou milhões de SNPs são genotipados em amostras compostas de centenas ou milhares de indivíduos. Com isso, surge o desa o de selecionar os SNPs mais informativos no conjunto de dados genotípico, onde o número de atributos é, geralmente, muito superior ao número de indivíduos, com a possibilidade de que existam atributos altamente correlacionados e, ainda, podendo haver interações entre pares, trios ou combinações de SNPs de quaisquer ordens. Os métodos mais usados em estudos de associação em escala genômica utilizam o valor-p de cada SNP em testes estatísticos de hipóteses, baseados em regressão para fenótipos contínuos e baseados nos testes qui-quadrado ou similares em classi cação para fenótipos discretos, como ltro para selecionar os SNPs mais signi cativos. Entretanto, essa classe de métodos captura somente SNPs com efeitos aditivos, pois a relação adotada é linear. Na tentativa de superar as limitações de procedimentos já estabelecidos, este trabalho propõe um novo método de seleção de SNPs baseado em técnicas de Aprendizado de Máquina e Inteligência Computacional denominado SNP Markers Selector (SMS). O modelo é construído a partir de uma abordagem que divide o problema de seleção de SNPs em três fases distintas: a primeira relacionada à análise de relevância dos marcadores, a segunda responsável pela de nição do conjunto de marcadores relevantes que serão considerados por meio de uma estratégia de corte com base em um limite de relevância dos marcadores e, nalmente, uma fase para o re namento do processo de corte, geralmente para diminuir marcadores falsos-positivos. No SMS, essas três etapas, foram implementadas utilizando-se Florestas Aleatórias, Máquina de Vetores Suporte e Algoritmos Genéticos respectivamente. O SMS objetiva a criação de um uxo de trabalho que maximize o potencial de seleção do modelo através de etapas complementares. Assim, espera-se aumentar o potencial do SMS capturar efeitos aditivos e/ou não-aditivos com interação moderada entre pares e trios de SNPs, ou até mesmo, interações de ordens superiores com efeitos que sejam minimamente detectáveis. O SMS pode ser aplicado tanto em problemas de regressão (fenótipo contínuo) quanto de classi cação (fenótipo discreto). Experimentos numéricos foram realizados para avaliação do potencial da estratégia apresentada, com o método sendo aplicado em sete conjuntos de dados simulados e em uma base de dados real, onde a capacidade de produção de leite predita de vacas leiteiras foi medida como fenótipo contínuo. Além disso, o método proposto foi comparado com os métodos baseados no valor-p e com o Lasso Bayesiano apresentando, de forma geral, melhores resultados do ponto de vista de SNPs verdadeiros-positivos nos dados simulados com efeitos aditivos juntamente com interações entre pares e trios de SNPs. No conjunto de dados reais, baseado em 56.947 SNPs e um único fenótipo relativo à produção de leite, o método identi cou 245 QTLs associados à produção e à composição do leite e 90 genes candidatos associados à mastite, à produção e à composição do leite, sendo esses QTLs e genes identi cados por estudos anteriores utilizando outros métodos de seleção. Assim, o método demonstrou ser competitivo frente aos métodos utilizados para comparação em cenários complexos, com dados simulados ou reais, o que indica seu potencial para estudos de associação em escala genômica em humanos, animais e vegetais. / Genome-wide association studies have as main objective to discovery SNP type molecular markers associated directly or indirectly to a speci c phenotype related to one or more characteristics of an individual or even a disease. The SNP could be the causative mutation itself or correlated with the causative mutation due to common inheritance. Aiming to identify the causal or promoter region of the phenotype, which is unknown a priori, thousands or millions of SNPs are genotyped in samples composed of hundreds or thousands of individuals. Therefore, emerges the necessity to confront a challenge of selecting the most informative SNPs in genotype data set where the number of attributes are, usually, much higher than the number of individuals. Besides, the possibility of highly correlated attributes should be considered, as well as interactions between pairs, trios or combinations of high order SNPs. The most usual methods applied on genomewide association studies adopt the p-value of each SNP as a lter to select the SNPs most signi cant. For continuous phenotypes the statistical regression-based hypothesis test is used and the Chi-Square test or similar for classi cation of discrete phenotypes. However, this class of methods capture only SNPs with additive e ects, due to the linear relationship considered. In an attempt to overcome the limitations of established procedures, this work proposes a new SNPs selection method, named SNP Markers Selector (SMS), based on Machine Learning and Computational Intelligence strategies. The model is built considering an approach which divides the SNPs selection problem in three distinct phases: the rst related to the evaluation of the markers relevance, a second responsible for the de nition of the set of the relevant markers that will be considered by means of a cut strategy based on a threshold of markers relevance and, nally, a phase for the re nement of the cut process, usually to diminish false-positive markers. In the SMS, these three steps were implemented using Random Forests, Support Vector Machine and Genetic Algorithms, respectively. The SMS intends to create a work ow that maximizes the SNPs selection potential of the model due to the adoption of steps considered complementary. In this way, there is an increasing expectation on the performance of the SMS to capture additive e ects, moderate non-additive interaction between pairs and trios of SNPs, or even, higher order interactions with minimally detectable e ects. The SMS can be applied both in regression problems (continuous phenotype) as in classi cation problems (discrete phenotype). Numerical experiments were performed to evaluate the potential of the strategy, with the method being applied in seven sets of simulated data and in a real data set, where milk production capacity predicated of dairy cows was measured as continuous phenotype. Besides, the comparison of the proposed method with methods based on p-value and Lasso Bayesian technique indicate, in general, competitive results from the point of view of true-positive SNPs using simulated data set with additive e ects in conjunction with interactions of pairs and trios of SNPs. In the real data, based on 56,947 SNPs and a single phenotype of milk production, the method identi ed 245 QTLs associated with milk production and composition and 90 candidate genes associated with mastitis, milk production and composition, standing out that these QTLs and genes were identi ed by previous studies using other selection methods. Thus, the experiments showed the potential of the method in relation to other strategies when complex scenarios with simulated or real data are adopted, indicating that the work ow developed to guide the construction of the method should be considered for genome-wide asociation studies in humans, animals and plants. CNPQ::CIENCIAS EXATAS E DA TERRA Máquina de Vetores Suporte Florestas Aleatórias Algoritmos Genéticos Polimorfismos de Base Única Genome-wide association studies Support Vector Machine Random Forests Genetic Algorithms Single Nucleotide Polymorphisms
392	On the use of a discriminant approach for handwritten word recognition based on bi-character models / Vers une approche discriminante pour la reconnaissance de mots manuscrits en-ligne utilisant des modèles de bi-caractères Prum, Sophea 08 November 2013 (has links) Avec l’avènement des dispositifs nomades tels que les smartphones et les tablettes, la reconnaissance automatique de l’écriture manuscrite cursive à partir d’un signal en ligne est devenue durant les dernières décennies un besoin réel de la vie quotidienne à l’ère numérique. Dans le cadre de cette thèse, nous proposons de nouvelles stratégies pour un système de reconnaissance de mots manuscrits en-ligne. Ce système se base sur une méthode collaborative segmentation/reconnaissance et en utilisant des analyses à deux niveaux : caractère et bi-caractères. Plus précisément, notre système repose sur une segmentation de mots manuscrits en graphèmes afin de créer un treillis à L niveaux. Chaque noeud de ce treillis est considéré comme un caractère potentiel envoyé à un moteur de Reconnaissance de Caractères Isolés (RCI) basé sur un SVM. Pour chaque noeud, ce dernier renvoie une liste de caractères associés à une liste d’estimations de probabilités de reconnaissance. Du fait de la grande diversité des informations résultant de la segmentation en graphèmes, en particulier à cause de la présence de morceaux de caractères et de ligatures, l’injection de chacun des noeuds du treillis dans le RCI engendre de potentielles ambiguïtés au niveau du caractère. Nous proposons de lever ces ambiguïtés en utilisant des modèles de bi-caractères, basés sur une régression logistique dont l’objectif est de vérifier la cohérence des informations à un niveau de reconnaissance plus élevé. Finalement, les résultats renvoyés par le RCI et l’analyse des modèles de bi-caractères sont utilisés dans la phase de décodage pour parcourir le treillis dans le but de trouver le chemin optimal associé à chaque mot dans le lexique. Deux méthodes de décodage sont proposées (recherche heuristique et programmation dynamique), la plus efficace étant basée sur de la programmation dynamique. / With the advent of mobile devices such as tablets and smartphones over the last decades, on-line handwriting recognition has become a very highly demanded service for daily life activities and professional applications. This thesis presents a new approach for on-line handwriting recognition. This approach is based on explicit segmentation/recognition integrated in a two level analysis system: character and bi-character. More specifically, our system segments a handwritten word in a sequence of graphemes to be then used to create a L-levels lattice of graphemes. Each node of the lattice is considered as a character to be submitted to a SVM based Isolated Character Recognizer (ICR). The ICR returns a list of potential character candidates, each of which is associated with an estimated recognition probability. However, each node of the lattice is a combination of various segmented graphemes. As a consequence, a node may contain some ambiguous information that cannot be handled by the ICR at character level analysis. We propose to solve this problem using "bi-character" models based on Logistic Regression, in order to verify the consistency of the information at a higher level of analysis. Finally, the recognition results provided by the ICR and the bi-character models are used in the word decoding stage, whose role is to find the optimal path in the lattice associated to each word in the lexicon. Two methods are presented for word decoding (heuristic search and dynamic programming), and dynamic programming is found to be the most effective. Modèle de bi-caractères Programmation dynamique Séparateurs à vaste marge On-line handwriting recognition Bi-character models Dynamic programming Support vector machine Combining on-line and Off-line Features
393	Metamodel-Based Multidisciplinary Design Optimization of Automotive Structures Ryberg, Ann-Britt January 2017 (has links) Multidisciplinary design optimization (MDO) can be used in computer aided engineering (CAE) to efficiently improve and balance performance of automotive structures. However, large-scale MDO is not yet generally integrated within automotive product development due to several challenges, of which excessive computing times is the most important one. In this thesis, a metamodel-based MDO process that fits normal company organizations and CAE-based development processes is presented. The introduction of global metamodels offers means to increase computational efficiency and distribute work without implementing complicated multi-level MDO methods. The presented MDO process is proven to be efficient for thickness optimization studies with the objective to minimize mass. It can also be used for spot weld optimization if the models are prepared correctly. A comparison of different methods reveals that topology optimization, which requires less model preparation and computational effort, is an alternative if load cases involving simulations of linear systems are judged to be of major importance. A technical challenge when performing metamodel-based design optimization is lack of accuracy for metamodels representing complex responses including discontinuities, which are common in for example crashworthiness applications. The decision boundary from a support vector machine (SVM) can be used to identify the border between different types of deformation behaviour. In this thesis, this information is used to improve the accuracy of feedforward neural network metamodels. Three different approaches are tested; to split the design space and fit separate metamodels for the different regions, to add estimated guiding samples to the fitting set along the boundary before a global metamodel is fitted, and to use a special SVM-based sequential sampling method. Substantial improvements in accuracy are observed, and it is found that implementing SVM-based sequential sampling and estimated guiding samples can result in successful optimization studies for cases where more conventional methods fail. metamodel artificial neural network (ANN) support vector machine (SVM) sequential sampling crashworthiness automotive structure spot weld optimization Mechanical Engineering Maskinteknik Applied Mechanics Teknisk mekanik Vehicle Engineering Farkostteknik
394	Sparse Multiclass And Multi-Label Classifier Design For Faster Inference Bapat, Tanuja 12 1900 (has links) (PDF) Many real-world problems like hand-written digit recognition or semantic scene classiﬁcation are treated as multiclass or multi-label classiﬁcation prob-lems. Solutions to these problems using support vector machines (SVMs) are well studied in literature. In this work, we focus on building sparse max-margin classiﬁers for multiclass and multi-label classiﬁcation. Sparse representation of the resulting classiﬁer is important both from eﬃcient training and fast inference viewpoints. This is true especially when the training and test set sizes are large.Very few of the existing multiclass and multi-label classiﬁcation algorithms have given importance to controlling the sparsity of the designed classiﬁers directly. Further, these algorithms were not found to be scalable. Motivated by this, we propose new formulations for sparse multiclass and multi-label classiﬁer design and also give eﬃcient algorithms to solve them. The formulation for sparse multi-label classiﬁcation also incorporates the prior knowledge of label correlations. In both the cases, the classiﬁcation model is designed using a common set of basis vectors across all the classes. These basis vectors are greedily added to an initially empty model, to approximate the target function. The sparsity of the classiﬁer can be controlled by a user deﬁned parameter, dmax which indicates the max-imum number of common basis vectors. The computational complexity of these algorithms for multiclass and multi-label classiﬁer designisO(lk2d2 max), Where l is the number of training set examples and k is the number of classes. The inference time for the proposed multiclass and multi-label classiﬁers is O(kdmax). Numerical experiments on various real-world benchmark datasets demonstrate that the proposed algorithms result in sparse classiﬁers that require lesser number of basis vectors than required by state-of-the-art algorithms, to attain the same generalization performance. Very small value of dmax results in signiﬁcant reduction in inference time. Thus, the proposed algorithms provide useful alternatives to the existing algorithms for sparse multiclass and multi-label classiﬁer design. Artificial Intelligence Machine Learning Multiclass Classification Multi-label Classification Sparse Max-Margin Classifiers Support Vector Machine (SVM) Sparse Classifiers Computer Science
395	Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade / Supervised machine learning models using kernel methods, probability measures and fuzzy sets Jorge Luis Guevara Díaz 04 May 2015 (has links) Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos. / This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets. Conjunto fuzzy Kernel positivo definido Máquina de vetor de suporte Medidas de probabilidade Métodos de kernel Fuzzy set Kernel methods Positive definite kernel Probability measure Support vector data description Support vector machine
396	How accuracy of estimated glottal flow waveforms affects spoofed speech detection performance Deivard, Johannes January 2020 (has links) In the domain of automatic speaker verification, one of the challenges is to keep the malevolent people out of the system. One way to do this is to create algorithms that are supposed to detect spoofed speech. There are several types of spoofed speech and several ways to detect them, one of which is to look at the glottal flow waveform (GFW) of a speech signal. This waveform is often estimated using glottal inverse filtering (GIF), since, in order to create the ground truth GFW, special invasive equipment is required. To the author’s knowledge, no research has been done where the correlation of GFW accuracy and spoofed speech detection (SSD) performance is investigated. This thesis tries to find out if the aforementioned correlation exists or not. First, the performance of different GIF methods is evaluated, then simple SSD machine learning (ML) models are trained and evaluated based on their macro average precision. The ML models use different datasets composed of parametrized GFWs estimated with the GIF methods from the previous step. Results from the previous tasks are then combined in order to spot any correlations. The evaluations of the different methods showed that they created GFWs of varying accuracy. The different machine learning models also showed varying performance depending on what type of dataset that was being used. However, when combining the results, no obvious correlations between GFW accuracy and SSD performance were detected. This suggests that the overall accuracy of a GFW is not a substantial factor in the performance of machine learning-based SSD algorithms. computer science machine learning automatic speech verification spoofed speech detection glottal flow waveform glottal inverse filtering artificial neural network logistic regression support vector machine classifiers Computer Sciences Datavetenskap (datalogi)
397	Identifying Categorical Land Use Transition and Land Degradation in Northwestern Drylands of Ethiopia Zewdie, Worku, Csaplovics, Elmar 08 June 2016 (has links) Land use transition in dryland ecosystems is one of the major driving forces to landscape change that directly impacts the welfare of humans. In this study, the support vector machine (SVM) classification algorithm and cross tabulation matrix analysis are used to identify systematic and random processes of change. The magnitude and prevailing signals of land use transitions are assessed taking into account net change and swap change. Moreover, spatiotemporal patterns and the relationship of precipitation and the Normalized Difference Vegetation Index (NDVI) are explored to evaluate landscape degradation. The assessment showed that 44% of net change and about 54% of total change occurred during the study period, with the latter being due to swap change. The conversion of over 39% of woodland to cropland accounts for the existence of the highest loss of valuable ecosystem of the region. The spatial relationship of NDVI and precipitation also showed R2 of below 0.5 over 55% of the landscape with no significant changes in the precipitation trend, thus representing an indicative symptom of land degradation. This in-depth analysis of random and systematic landscape change is crucial for designing policy intervention to halt woodland degradation in this fragile environment. info:eu-repo/classification/ddc/620 ddc:620
398	Počítačová podpora rozpoznávání a klasifikace rodových erbů / Computer Aided Recognization and Classification of Coat of Arms Vídeňský, František January 2017 (has links) This master thesis describes the design and development of the system for detection and recognition of whole coat of arms as well as each heraldic parts. In the thesis are presented methods of computer vision for segmentation and detection of an object and selected methods that are the most suitable. Most of the heraldic parts are segmented using a convolution neural networks and the rest using active contours. The Histogram of the gradient method was selected for coats of arms detection in an image. For training and functionality verification is used my own data set. The resulting system can serve as an auxiliary tool used in auxiliary sciences of history.
399	Detekce fibrilace síní v krátkodobých EKG záznamech / Detection of atrial fibrillation in short-term ECG Ambrožová, Monika January 2019 (has links) Atrial fibrillation is diagnosed in 1-2% of the population, in next decades, it expects a significant increase in the number of patients with this arrhythmia in connection with the aging of the population and the higher incidence of some diseases that are considered as risk factors of atrial fibrillation. The aim of this work is to describe the problem of atrial fibrillation and the methods that allow its detection in the ECG record. In the first part of work there is a theory dealing with cardiac physiology and atrial fibrillation. There is also basic descreption of the detection of atrial fibrillation. In the practical part of work, there is described software for detection of atrial fibrillation, which is provided by BTL company. Furthermore, an atrial fibrillation detector is designed. Several parameters were selected to detect the variation of RR intervals. These are the parameters of the standard deviation, coefficient of skewness and kurtosis, coefficient of variation, root mean square of the successive differences, normalized absolute deviation, normalized absolute difference, median absolute deviation and entropy. Three different classification models were used: support vector machine (SVM), k-nearest neighbor (KNN) and discriminant analysis classification. The SVM classification model achieves the best results. Results of success indicators (sensitivity: 67.1%; specificity: 97.0%; F-measure: 66.8%; accuracy: 92.9%).
400	Analýza experimentálních EKG záznamů / Analysis of experimental ECG Maršánová, Lucie January 2015 (has links) This diploma thesis deals with the analysis of experimental electrograms (EG) recorded from isolated rabbit hearts. The theoretical part is focused on the basic principles of electrocardiography, pathological events in ECGs, automatic classification of ECG and experimental cardiological research. The practical part deals with manual classification of individual pathological events – these results will be presented in the database of EG records, which is under developing at the Department of Biomedical Engineering at BUT nowadays. Manual scoring of data was discussed with experts. After that, the presence of pathological events within particular experimental periods was described and influence of ischemia on heart electrical activity was reviewed. In the last part, morphological parameters calculated from EG beats were statistically analised with Kruskal-Wallis and Tukey-Kramer tests and also principal component analysis (PCA) and used as classification features to classify automatically four types of the beats. Classification was realized with four approaches such as discriminant function analysis, k-Nearest Neighbours, support vector machines, and naive Bayes classifier.

Search results