Global ETD Search

341	Supervised Learning of Piecewise Linear Models Manwani, Naresh January 2012 (has links) (PDF) Supervised learning of piecewise linear models is a well studied problem in machine learning community. The key idea in piecewise linear modeling is to properly partition the input space and learn a linear model for every partition. Decision trees and regression trees are classic examples of piecewise linear models for classification and regression problems. The existing approaches for learning decision/regression trees can be broadly classified in to two classes, namely, fixed structure approaches and greedy approaches. In the fixed structure approaches, tree structure is fixed before hand by fixing the number of non leaf nodes, height of the tree and paths from root node to every leaf node of the tree. Mixture of experts and hierarchical mixture of experts are examples of fixed structure approaches for learning piecewise linear models. Parameters of the models are found using, e.g., maximum likelihood estimation, for which expectation maximization(EM) algorithm can be used. Fixed structure piecewise linear models can also be learnt using risk minimization under an appropriate loss function. Learning an optimal decision tree using fixed structure approach is a hard problem. Constructing an optimal binary decision tree is known to be NP Complete. On the other hand, greedy approaches do not assume any parametric form or any fixed structure for the decision tree classifier. Most of the greedy approaches learn tree structured piecewise linear models in a top down fashion. These are built by binary or multi-way recursive partitioning of the input space. The main issues in top down decision tree induction is to choose an appropriate objective function to rate the split rules. The objective function should be easy to optimize. Top-down decision trees are easy to implement and understand, but there are no optimality guarantees due to their greedy nature. Regression trees are built in the similar way as decision trees. In regression trees, every leaf node is associated with a linear regression function. All piece wise linear modeling techniques deal with two main tasks, namely, partitioning of the input space and learning a linear model for every partition. However, Partitioning of the input space and learning linear models for different partitions are not independent problems. Simultaneous optimal estimation of partitions and learning linear models for every partition, is a combinatorial problem and hence computationally hard. However, piecewise linear models provide better insights in to the classification or regression problem by giving explicit representation of the structure in the data. The information captured by piecewise linear models can be summarized in terms of simple rules, so that, they can be used to analyze the properties of the domain from which the data originates. These properties make piecewise linear models, like decision trees and regression trees, extremely useful in many data mining applications and place them among top data mining algorithms. In this thesis, we address the problem of supervised learning of piecewise linear models for classification and regression. We propose novel algorithms for learning piecewise linear classifiers and regression functions. We also address the problem of noise tolerant learning of classifiers in presence of label noise. We propose a novel algorithm for learning polyhedral classifiers which are the simplest form of piecewise linear classifiers. Polyhedral classifiers are useful when points of positive class fall inside a convex region and all the negative class points are distributed outside the convex region. Then the region of positive class can be well approximated by a simple polyhedral set. The key challenge in optimally learning a fixed structure polyhedral classifier is to identify sub problems, where each sub problem is a linear classification problem. This is a hard problem and identifying polyhedral separability is known to be NP complete. The goal of any polyhedral learning algorithm is to efficiently handle underlying combinatorial problem while achieving good classification accuracy. Existing methods for learning a fixed structure polyhedral classifier are based on solving non convex constrained optimization problems. These approaches do not efficiently handle the combinatorial aspect of the problem and are computationally expensive. We propose a method of model based estimation of posterior class probability to learn polyhedral classifiers. We solve an unconstrained optimization problem using a simple two step algorithm (similar to EM algorithm) to find the model parameters. To the best of our knowledge, this is the first attempt to form an unconstrained optimization problem for learning polyhedral classifiers. We then modify our algorithm to find the number of required hyperplanes also automatically. We experimentally show that our approach is better than the existing polyhedral learning algorithms in terms of training time, performance and the complexity. Most often, class conditional densities are multimodal. In such cases, each class region may be represented as a union of polyhedral regions and hence a single polyhedral classifier is not sufficient. To handle such situation, a generic decision tree is required. Learning optimal fixed structure decision tree is a computationally hard problem. On the other hand, top-down decision trees have no optimality guarantees due to the greedy nature. However, top-down decision tree approaches are widely used as they are versatile and easy to implement. Most of the existing top-down decision tree algorithms (CART,OC1,C4.5, etc.) use impurity measures to assess the goodness of hyper planes at each node of the tree. These measures do not properly capture the geometric structures in the data. We propose a novel decision tree algorithm that ,at each node, selects hyperplanes based on an objective function which takes into consideration geometric structure of the class regions. The resulting optimization problem turns out to be a generalized eigen value problem and hence is efficiently solved. We show through empirical studies that our approach leads to smaller size trees and better performance compared to other top-down decision tree approaches. We also provide some theoretical justification for the proposed method of learning decision trees. Piecewise linear regression is similar to the corresponding classification problem. For example, in regression trees, each leaf node is associated with a linear regression model. Thus the problem is once again that of (simultaneous) estimation of optimal partitions and learning a linear model for each partition. Regression trees, hinge hyperplane method, mixture of experts are some of the approaches to learn continuous piecewise linear regression models. Many of these algorithms are computationally intensive. We present a method of learning piecewise linear regression model which is computationally simple and is capable of learning discontinuous functions as well. The method is based on the idea of K plane regression that can identify a set of linear models given the training data. K plane regression is a simple algorithm motivated by the philosophy of k means clustering. However this simple algorithm has several problems. It does not give a model function so that we can predict the target value for any given input. Also, it is very sensitive to noise. We propose a modified K plane regression algorithm which can learn continuous as well as discontinuous functions. The proposed algorithm still retains the spirit of k means algorithm and after every iteration it improves the objective function. The proposed method learns a proper Piece wise linear model that can be used for prediction. The algorithm is also more robust to additive noise than K plane regression. While learning classifiers, one normally assumes that the class labels in the training data set are noise free. However, in many applications like Spam filtering, text classification etc., the training data can be mislabeled due to subjective errors. In such cases, the standard learning algorithms (SVM, Adaboost, decision trees etc.) start over fitting on the noisy points and lead to poor test accuracy. Thus analyzing the vulnerabilities of classifiers to label noise has recently attracted growing interest from the machine learning community. The existing noise tolerant learning approaches first try to identify the noisy points and then learn classifier on remaining points. In this thesis, we address the issue of developing learning algorithms which are inherently noise tolerant. An algorithm is inherently noise tolerant if, the classifier it learns with noisy samples would have the same performance on test data as that learnt from noise free samples. Algorithms having such robustness (under suitable assumption on the noise) are attractive for learning with noisy samples. Here, we consider non uniform label noise which is a generic noise model. In non uniform label noise, the probability of the class label for an example being incorrect, is a function of the feature vector of the example.(We assume that this probability is less than 0.5 for all feature vectors.) This can account for most cases of noisy data sets. There is no provably optimal algorithm for learning noise tolerant classifiers in presence of non uniform label noise. We propose a novel characterization of noise tolerance of an algorithm. We analyze noise tolerance properties of risk minimization frame work as risk minimization is a common strategy for classifier learning. We show that risk minimization under 01 loss has the best noise tolerance properties. None of the other convex loss functions have such noise tolerance properties. Empirical risk minimization under 01 loss is a hard problem as 01 loss function is not differentiable. We propose a gradient free stochastic optimization technique to minimize risk under 01 loss function for noise tolerant learning of linear classifiers. We show (under some conditions) that the algorithm converges asymptotically to the global minima of the risk under 01 loss function. We illustrate the noise tolerance of our algorithm through simulations experiments. We demonstrate the noise tolerance of the algorithm through simulations. Linear Models Linear Models (Classification) Linear Models (Regression) Polyhedral Classifiers Decision Trees Piecewise Linear Regression Noise Tolerant Learning Piecewise Linear Models Polyhedral Classifier Learning Geometric Decision Tree Regression Trees Nonlinear Models Supervised Learning Computer Science
342	An authomatic method for construction of multi-classifier systems based on the combination of selection and fusion Lima, Tiago Pessoa Ferreira de 26 February 2013 (has links) Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T17:38:41Z No. of bitstreams: 2 Dissertaçao Tiago de Lima.pdf: 1469834 bytes, checksum: 95a0326778b3d0f98bd35a7449d8b92f (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T14:23:38Z (GMT) No. of bitstreams: 2 Dissertaçao Tiago de Lima.pdf: 1469834 bytes, checksum: 95a0326778b3d0f98bd35a7449d8b92f (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-13T14:23:38Z (GMT). No. of bitstreams: 2 Dissertaçao Tiago de Lima.pdf: 1469834 bytes, checksum: 95a0326778b3d0f98bd35a7449d8b92f (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-02-26 / In this dissertation, we present a methodology that aims the automatic construction of multi-classifiers systems based on the combination of selection and fusion. The presented method initially finds an optimum number of clusters for training data set and subsequently determines an ensemble for each cluster found. For model evaluation, the testing data set are submitted to clustering techniques and the nearest cluster to data input will emit a supervised response through its associated ensemble. Self-organizing maps were used in the clustering phase and multilayer perceptrons were used in the classification phase. Adaptive differential evolution has been used in this work in order to optimize the parameters and performance of the different techniques used in the classification and clustering phases. The proposed method, called SFJADE - Selection and Fusion (SF) via Adaptive Differential Evolution (JADE), has been tested on data compression of signals generated by artificial nose sensors and well-known classification problems, including cancer, card, diabetes, glass, heart, horse, soybean and thyroid. The experimental results have shown that the SFJADE method has a better performance than some literature methods while significantly outperforming most of the methods commonly used to construct Multi-Classifier Systems. / Nesta dissertação, nós apresentamos uma metodologia que almeja a construção automática de sistemas de múltiplos classificadores baseados em uma combinação de seleção e fusão. O método apresentado inicialmente encontra um número ótimo de grupos a partir do conjunto de treinamento e subsequentemente determina um comitê para cada grupo encontrado. Para avaliação do modelo, os dados de teste são submetidos à técnica de agrupamento e o grupo mais próximo do dado de entrada irá emitir uma resposta supervisionada por meio de seu comitê associado. Mapas Auto Organizáveis foi usado na fase de agrupamento e Perceptrons de múltiplas camadas na fase de classificação. Evolução Diferencial Adaptativa foi utilizada neste trabalho a fim de otimizar os parâmetros e desempenho das diferentes técnicas utilizadas nas fases de classificação e agrupamento de dados. O método proposto, chamado SFJADE – Selection and Fusion (SF) via Adaptive Differential Evolution (JADE), foi testado em dados gerados para sensores de um nariz artificial e problemas de referência em classificação de padrões, que são: cancer, card, diabetes, glass, heart, heartc e horse. Os resultados experimentais mostraram que SFJADE possui um melhor desempenho que alguns métodos da literatura, além de superar a maioria dos métodos geralmente usados para a construção de sistemas de múltiplos classificadores. Sistemas de múltiplos classificadores Comitês Seleção e fusão Mapas auto organizáveis Perceptron de múltiplas camadas Evolução diferencial adaptativa Multi-classifier systems Ensembles Selection and fusion Self-organizing maps Multilayer perceptron Adaptive differential evolution
343	Classificadores e aprendizado em processamento de imagens e visão computacional / Classifiers and machine learning techniques for image processing and computer vision Rocha, Anderson de Rezende, 1980- 03 March 2009 (has links) Orientador: Siome Klein Goldenstein / Tese (doutorado) - Universidade Estadual de Campinas, Instituto da Computação / Made available in DSpace on 2018-08-12T17:37:15Z (GMT). No. of bitstreams: 1 Rocha_AndersondeRezende_D.pdf: 10303487 bytes, checksum: 243dccfe5255c828ce7ead27c27eb1cd (MD5) Previous issue date: 2009 / Resumo: Neste trabalho de doutorado, propomos a utilizaçãoo de classificadores e técnicas de aprendizado de maquina para extrair informações relevantes de um conjunto de dados (e.g., imagens) para solução de alguns problemas em Processamento de Imagens e Visão Computacional. Os problemas de nosso interesse são: categorização de imagens em duas ou mais classes, detecçãao de mensagens escondidas, distinção entre imagens digitalmente adulteradas e imagens naturais, autenticação, multi-classificação, entre outros. Inicialmente, apresentamos uma revisão comparativa e crítica do estado da arte em análise forense de imagens e detecção de mensagens escondidas em imagens. Nosso objetivo é mostrar as potencialidades das técnicas existentes e, mais importante, apontar suas limitações. Com esse estudo, mostramos que boa parte dos problemas nessa área apontam para dois pontos em comum: a seleção de características e as técnicas de aprendizado a serem utilizadas. Nesse estudo, também discutimos questões legais associadas a análise forense de imagens como, por exemplo, o uso de fotografias digitais por criminosos. Em seguida, introduzimos uma técnica para análise forense de imagens testada no contexto de detecção de mensagens escondidas e de classificação geral de imagens em categorias como indoors, outdoors, geradas em computador e obras de arte. Ao estudarmos esse problema de multi-classificação, surgem algumas questões: como resolver um problema multi-classe de modo a poder combinar, por exemplo, caracteríisticas de classificação de imagens baseadas em cor, textura, forma e silhueta, sem nos preocuparmos demasiadamente em como normalizar o vetor-comum de caracteristicas gerado? Como utilizar diversos classificadores diferentes, cada um, especializado e melhor configurado para um conjunto de caracteristicas ou classes em confusão? Nesse sentido, apresentamos, uma tecnica para fusão de classificadores e caracteristicas no cenário multi-classe através da combinação de classificadores binários. Nós validamos nossa abordagem numa aplicação real para classificação automática de frutas e legumes. Finalmente, nos deparamos com mais um problema interessante: como tornar a utilização de poderosos classificadores binarios no contexto multi-classe mais eficiente e eficaz? Assim, introduzimos uma tecnica para combinação de classificadores binarios (chamados classificadores base) para a resolução de problemas no contexto geral de multi-classificação. / Abstract: In this work, we propose the use of classifiers and machine learning techniques to extract useful information from data sets (e.g., images) to solve important problems in Image Processing and Computer Vision. We are particularly interested in: two and multi-class image categorization, hidden messages detection, discrimination among natural and forged images, authentication, and multiclassification. To start with, we present a comparative survey of the state-of-the-art in digital image forensics as well as hidden messages detection. Our objective is to show the importance of the existing solutions and discuss their limitations. In this study, we show that most of these techniques strive to solve two common problems in Machine Learning: the feature selection and the classification techniques to be used. Furthermore, we discuss the legal and ethical aspects of image forensics analysis, such as, the use of digital images by criminals. We introduce a technique for image forensics analysis in the context of hidden messages detection and image classification in categories such as indoors, outdoors, computer generated, and art works. From this multi-class classification, we found some important questions: how to solve a multi-class problem in order to combine, for instance, several different features such as color, texture, shape, and silhouette without worrying about the pre-processing and normalization of the combined feature vector? How to take advantage of different classifiers, each one custom tailored to a specific set of classes in confusion? To cope with most of these problems, we present a feature and classifier fusion technique based on combinations of binary classifiers. We validate our solution with a real application for automatic produce classification. Finally, we address another interesting problem: how to combine powerful binary classifiers in the multi-class scenario more effectively? How to boost their efficiency? In this context, we present a solution that boosts the efficiency and effectiveness of multi-class from binary techniques. / Doutorado / Engenharia de Computação / Doutor em Ciência da Computação Aprendizado de máquina - Técnica Análise forense de imagem Esteganalise Fusão de caracteristicas Fusão de classificadores Classificação multi-classe Categorização de imagens Machine learning - Technique Forensic image analysis Steganalysis Feature fusion Classifier fusion Multi-class classification Image categorization
344	O uso da atenção como classificador diagnóstico em crianças e adolescentes com transtorno do humor bipolar e transtorno de déficit de atenção e hiperatividade / Attention-based classification pattern in youths with bipolar disorder and attention-deficit/hyperactivity disorder Ana Kleinman 14 August 2013 (has links) O desenvolvimento de novas tecnologias vem contribuindo para um conhecimento mais aprofundado da fisiopatologia dos transtornos psiquiátricos, mas os resultados ainda são controversos e não parecem ser específicos para cada diagnóstico. As altas taxas de comorbidade também questionam as características principais de um diagnóstico específico. Em 2009, o Instituto Nacional de Saúde Mental dos EUA iniciou um projeto chamado Research Domain Criteria (RDoC) com o objetivo de desenvolver novas classificações para a pesquisa baseadas em dimensões de comportamentos observáveis associadas a medidas neurobiológicas. Para o estudo da fisiopatologia da comorbidade entre duas doenças mentais, esta proposta sugere que se execute o estudo de sintomas compartilhados e não partir de dois grupos diagnósticos distintos. Na psiquiatria infantil, as altas taxas de comorbidade entre o transtorno do humor bipolar (THB) e o transtorno de déficit de atenção e hiperatividade (TDAH) são um tema controverso. O prejuízo na atenção é um forte candidato para um estudo com a metodologia proposta pelo RDoC visto que os poucos estudos que avaliaram concomitantemente a atenção em jovens com THB e TDAH apresentaram resultados contraditórios. Um dos testes mais utilizados para o estudo da atenção em THB e TDAH é o Continuous Performance Test (CPT). Nossos objetivos foram: 1.Verificar qual é o melhor agrupamento dos sujeitos através dos resultados do Conner\'s Continuous Performance Test (CPT II) independentemente do grupo de origem (THB, TDAH, THB+TDAH, controles); 2. Construir um classificador baseado nos resultados do CPT II; 3com THB+TDAH e 18 controles com idades entre 12 e 17 anos. A melhor divisão dos sujeitos, baseada nos resultados do CPT II, foi em dois novos subgrupos. Grupo A com 35 sujeitos composto de: 30% THB, 52,2% TDAH, 51,5% THB+TDAH, e 16,7% controles. Grupo B com 49 sujeitos: 70% THB, 47,8% TDAH, 48,5% THB+TDAH, e 83,3% controles. O grupo A comparado com o B apresentou um prejuízo funcional maior evidenciado por médias significativamente mais altas no CPT II, com uma diferença significativa em oito das 12 variáveis do CPT II: omissão (p=0,0003), comissão (p=0,00000002), erro padrão (EP) do tempo de reação (TR) (p=1,7x10-20), variabilidade do EP (p=4,3x10-22), detectabilidade (p=0,000008), perseveração (p=0,0000001), TR por intervalo interestímulo (IIE) (p=4,7x10-10) e TR(EP)IIE (p= 1,5x10 -13). Foi possível construir um classificador baseado nas doze variáveis do CPT II, sendo sua acurácia de 98,8% em relação a nossa amostra e 95,2% em relação à validação cruzada confirmando a consistência desses novos grupos. As principais variáveis do CPT II usadas na função discriminante desses novos agrupamentos foram: variabilidade do erro padrão, erro padrão de TR e erro padrão de TR por intervalo interestímulo. Não houve diferença estatística em nenhuma das variáveis do CPT II quando realizamos a comparação tradicional entre THB, TDAH, THB+TDAH, e controles; e a acurácia do classificador para esses grupos foi mais baixa, de 40,5% na nossa amostra e 23,8% na validação cruzada. Discussão: Esses resultados evidenciam a heterogeneidade encontrada nas respostas do CPT II pelos grupos THB, TDAH, THB+TDAH, e controles. As três medidas que mais influenciaram a diferenciação entre os novos agrupamentos A e B foram as que medem a variação no tempo de resposta, que é um dos prejuízos mais replicados no TDAH e também está associada com THB. Essa variabilidade de resposta aumentada é sugerida como um marcador endofenotípico inespecífico de psicopatologia. Conclusão: Nossos achados refletem a heterogeneidade encontrada em pacientes classificados através de categorias diagnósticas vigentes e sugerem que a abordagem da metodologia do RDoC pode ser de grande valia para a melhor compreensão dos transtornos psiquiátricos que acometem crianças e adolescentes. Essa metodologia pode identificar subgrupos com diferenças relevantes do ponto de visto neurobiológico contribuindo para a melhor compreensão da fisiopatologia dos transtornos e promovendo caminhos nos quais a pesquisa pode trazer benefícios para decisões clínicas / The better understanding of psychiatric disorders\' pathophysiology is undeniable. Yet, the results are still replete of controversy and are not diagnostic specific. Categorical approach analysis implicitly involves the notion of a unitary entity, not taking into account the acknowledged heterogeneity present in clinical diagnoses. High comorbidity rates also raises questions about the core features of a specific diagnosis. For this purpose, the National Institute of Mental Health has initiated the Research Domain Criteria (RDoC) project. Instead of using disorders categories as the basis for grouping individuals, RDoC suggests to find relevant dimensions that can cut across traditional disorders. The starting point suggested to study comorbid disorders should be shared symptoms and behaviors, instead of two distinct diagnostic groups. One of the strongest controversies in child psychiatry is the high comorbidity rate between bipolar disorder (BD) and attention-deficit/hyperactivity disorder (ADHD). Distractibility, one of the most common symptoms in BD and ADHD could be a good candidate for an RDoC unit of analysis. Our aim was first to study the patterns of attention based on the Conners\' Continuous Performance Test (CPTII) results in youth with BD, ADHD, BD+ADHD and controls; followed by developing a classifier to compare the classification accuracy of this new formed groups and the original diagnostic ones. Results: 18 healthy controls, 23 patients with ADHD, 33 BD+ADHD and 10 BD were assessed. Using cluster analysis, the entire sample was best clustered in two new groups, A and B, based on the twelve CPT II variables performance, independently of the original diagnoses. 35 subjects in group A: 30% BD, 52.2% ADHD, 51.5% BD+ADHD and 16.7% controls. 49 individuals in group B: 70% BD, 47.8% ADHD, 48.5% BD+ADHD and 83.3% controls. Group A presented a greater impairment exhibited by higher means in all CPTII variables, SNAP-IV means, and lower CGAS means. When we compared the CPT II variables performance between the new clustered groups A and B we found eight out of the twelve CPT II measures that were statistically significant: omission (p=0.0003), commission (p=0.00000002), standard error (SE) of hit reaction time (RT) (p=1.7x10-20), variability of SE (p=4.3x10 -22), detectability (p=0.000008), perseveration (p=0.0000001), hit RT by interstimulus interval (ISI) (p=4.7x10 - 10) and hit RT SE ISI. We found high cross-validated classification accuracy for A and B groups: 95.2%. The stronger CPT II variables in the discriminative pattern were: variability of standard error ranking first, followed by hit RT SE, hit RT SE ISI. There were no statistically significant differences in any of the CPT II measures when comparing the four original groups (BD, ADHD, BD+ADHD, controls). The cross-validated classification accuracy based on the CPT II measures performance in order to classify subjects in the original four groups was much lower (23.8%). Discussion: These results highlight the heterogeneity of CPT II responses among each of the four original groups: BD, ADHD, BD+ADHD and controls. The three variables that most influenced the new clustered groups were the ones that measure and adolescents may share this attentional trait marker. Conclusion: In summary, our findings highlighted the heterogeneity of patients clustered by categorical diagnostic classification. In addition, our classificatory exercise supports the concept behind new approaches like the RDoC framework for child and adolescent psychiatry. It can define meaningful clinical subgroups for the purpose of pathophysiological studies and treatment selection, and provide a pathway by which research findings can be translated into changes in clinical decision making Adolescente Atenção Classificador Criança Reconhecimento automatizado de padrão Research Domain Criteria (RDoC) Testes neuropsicológicos Transtorno bipolar Transtorno de atenção e hiperatividade Adolescents Attention Attention-deficit/hyperactivity disorder Bipolar disorder Children Classifier Neuropsychological tests Pattern recognition automated Research Domain Criteria (RDoC)
345	Využití vybraných metod strojového učení pro modelování kreditního rizika / Machine Learning Methods for Credit Risk Modelling Drábek, Matěj January 2017 (has links) This master's thesis is divided into three parts. In the first part I described P2P lending, its characteristics, basic concepts and practical implications. I also compared P2P market in the Czech Republic, UK and USA. The second part consists of theoretical basics for chosen methods of machine learning, which are naive bayes classifier, classification tree, random forest and logistic regression. I also described methods to evaluate the quality of classification models listed above. The third part is a practical one and shows the complete workflow of creating classification model, from data preparation to evaluation of model.
346	Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approaches Helmersson, Benjamin January 2016 (has links) Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested. definition extraction machine learning pattern matching naive bayes regular expressions rev classifier terminology comparison definitionsextraktion maskininlärning mönstermatchning reguljära uttryck rev klassificerare terminologi jämförelse
347	Algorithms For Geospatial Analysis Using Multi-Resolution Remote Sensing Data Uttam Kumar, * 03 1900 (has links) (PDF) Geospatial analysis involves application of statistical methods, algorithms and information retrieval techniques to geospatial data. It incorporates time into spatial databases and facilitates investigation of land cover (LC) dynamics through data, model, and analytics. LC dynamics induced by human and natural processes play a major role in global as well as regional scale patterns, which in turn influence weather and climate. Hence, understanding LC dynamics at the local / regional as well as at global levels is essential to evolve appropriate management strategies to mitigate the impacts of LC changes. This can be captured through the multi-resolution remote sensing (RS) data. However, with the advancements in sensor technologies, suitable algorithms and techniques are required for optimal integration of information from multi-resolution sensors which are cost effective while overcoming the possible data and methodological constraints. In this work, several per-pixel traditional and advanced classification techniques have been evaluated with the multi-resolution data along with the role of ancillary geographical data on the performance of classifiers. Techniques for linear and non-linear un-mixing, endmember variability and determination of spatial distribution of class components within a pixel have been applied and validated on multi-resolution data. Endmember estimation method is proposed and its performance is compared with manual, semi-automatic and fully automatic methods of endmember extraction. A novel technique - Hybrid Bayesian Classifier is developed for per pixel classification where the class prior probabilities are determined by un-mixing a low spatial-high spectral resolution multi-spectral data while posterior probabilities are determined from the training data obtained from ground, that are assigned to every pixel in a high spatial-low spectral resolution multi-spectral data in Bayesian classification. These techniques have been validated with multi-resolution data for various landscapes with varying altitudes. As a case study, spatial metrics and cellular automata based models applied for rapidly urbanising landscape with moderate altitude has been carried out. Image Fusion Landscape Dynamics Urban Growth - Modeling and Simulation Pixel Classification Geospatial Analysis - Algorithms Multi-resolution Remote Sensing Data Land Use Pattern Classification Coarse Resolution Pixels Spatial Metrics Hybrid Bayesian Classifier Cellular Automata Applied Optics
348	Limit Modes of Particulate Materials Classifiers / Limit Modes of Particulate Materials Classifiers Adamčík, Martin January 2017 (has links) S požadavky materiálových věd na stále menší částice jsou potřebné i nové přístupy a metody jejich klasifikace. V disertační práci jsou zkoumány struktury turbulentního proudění a trajektorie částic uvnitř dynamického větrného třídiče. Zvyšující se výpočtový výkon a nové modely turbulence a přístupy modelování komplexních plně turbulentních problémů řešením Navier-Stokesových rovnic umožňují zkoumání stále menších lokálních proudových struktur a vlastností proudění s větší přesností. Částice menší než 10 mikronů jsou více ovlivnitelné a jejich klasifikace do hrubé nebo jemné frakce závisí na malých vírových strukturách. Práce se zaměřuje na podmínky nutné ke klasifikaci částic pod 10 mikronů, což je současná hranice možností metody větrné separace. CFD software a poslední poznatky modelování turbulence jsou použity v numerické simulaci proudových polí dynamického větrného třídiče a jsou zkoumány efekty měnících se operačních parametrů na proudová pole a klasifikaci diskrétní fáze. Experimentální verifikace numerických predikcí je realizovaná prostřednictvím částicové anemometrie na základě statistického zpracování obrazu (PIV) a proudění lopatkami rotoru je vizualizováno. Predikované trajektorie částic jsou experimentálně ověřeny třídícími testy na větrném třídiči a granulometrie je určená pomocí laserové difrakční metody. Zkoumány jsou Trompovy křivky a efektivita třídění.
349	Detekce, sledování a klasifikace automobilů / Detection, Tracking and Classification of Vehicles Vopálenský, Radek January 2018 (has links) The aim of this master thesis is to design and implement a system for the detection, tracking and classification of vehicles from streams or records from traffic cameras in language C++. The system runs on the platform Robot Operating System and uses the OpenCV, FFmpeg, TensorFlow and Keras libraries. For detection cascade classifier is used, for tracking Kalman filter and for classification of the convolutional neural network. Out of a total of 627 cars, 479 were tracked correctly. From this number 458 were classified (trucks or lorries not included). The resulting system can be used for traffic analysis.
350	Detekce Akustického Prostředí z Řeči / Acoustic Scene Classification from Speech Dobrotka, Matúš January 2018 (has links) The topic of this thesis is an audio recording classification with 15 different acoustic scene classes that represent common scenes and places where people are situated on a regular basis. The thesis describes 2 approaches based on GMM and i-vectors and a fusion of the both approaches. The score of the best GMM system which was evaluated on the evaluation dataset of the DCASE Challenge is 60.4%. The best i-vector system's score is 68.4%. The fusion of the GMM system and the best i-vector system achieves score of 69.3%, which would lead to the 20th place in the all systems ranking of the DCASE 2017 Challenge (among 98 submitted systems from all over the world).

Search results