Global ETD Search

1	Primary semantic type labeling in monologue discourse using a hierarchical classification approach Larson, Erik John 20 August 2010 (has links) The question of whether a machine can reproduce human intelligence is older than modern computation, but has received a great deal of attention since the first digital computers emerged decades ago. Language understanding, a hallmark of human intelligence, has been the focus of a great deal of work in Artificial Intelligence (AI). In 1950, mathematician Alan Turing proposed a kind of game, or test, to evaluate the intelligence of a machine by assessing its ability to understand written natural language. But nearly sixty years after Turing proposed his test of machine intelligence—pose questions to a machine and a person without seeing either, and try to determine which is the machine—no system has passed the Turing Test, and the question of whether a machine can understand natural language cannot yet be answered. The present investigation is, firstly, an attempt to advance the state of the art in natural language understanding by building a machine whose input is English natural language and whose output is a set of assertions that represent answers to certain questions posed about the content of the input. The machine we explore here, in other words, should pass a simplified version of the Turing Test and by doing so help clarify and expand on our understanding of the machine intelligence. Toward this goal, we explore a constraint framework for partial solutions to the Turing Test, propose a problem whose solution would constitute a significant advance in natural language processing, and design and implement a system adequate for addressing the problem proposed. The fully implemented system finds primary specific events and their locations in monologue discourse using a hierarchical classification approach, and as such provides answers to questions of central importance in the interpretation of discourse. / text Machine learning Hierarchical classification Natural language processing Discourse interpretation
2	Induction in Hierarchical Multi-label Domains with Focus on Text Categorization Dendamrongvit, Sareewan 02 May 2011 (has links) Induction of classifiers from sets of preclassified training examples is one of the most popular machine learning tasks. This dissertation focuses on the techniques needed in the field of automated text categorization. Here, each document can be labeled with more than one class, sometimes with many classes. Moreover, the classes are hierarchically organized, the mutual relations being typically expressed in terms of a generalization tree. Both aspects (multi-label classification and hierarchically organized classes) have so far received inadequate attention. Existing literature work largely assumes that it is enough to induce a separate binary classifier for each class, and the question of class hierarchy is rarely addressed. This, however, ignores some serious problems. For one thing, induction of thousands of classifiers from hundreds of thousands of examples described by tens of thousands of features (a common case in automated text categorization) incurs prohibitive computational costs---even a single binary classifier in domains of this kind often takes hours, even days, to induce. For another, the circumstance that the classes are hierarchically organized affects the way we view the classification performance of the induced classifiers. The presented work proposes a technique referred to by the acronym "H-kNN-plus." The technique combines support vector machines and nearest neighbor classifiers with the intention to capitalize on the strengths of both. As for performance evaluation, a variety of measures have been used to evaluate hierarchical classifiers, including the standard non-hierarchical criteria that assign the same weight to different types of error. The author proposes a performance measure that overcomes some of their weaknesses. The dissertation begins with a study of (non-hierarchical) multi-label classification. One of the reasons for the poor performance of earlier techniques is the class-imbalance problem---a small number of positive examples being outnumbered by a great many negative examples. Another difficulty is that each of the classes tends to be characterized by a different set of characteristic features. This means that most of the binary classifiers are induced from examples described by predominantly irrelevant features. Addressing these weaknesses by majority-class undersampling and feature selection, the proposed technique significantly improves the overall classification performance. Even more challenging is the issue of hierarchical classification. Here, the dissertation introduces a new induction mechanism, H-kNN-plus, and subjects it to extensive experiments with two real-world datasets. The results indicate its superiority, in these domains, over earlier work in terms of prediction performance as well as computational costs. Induction Text categorization Hierarchical classification Multi-label examples Imbalanced classes
3	Sistema hierárquico de classificação para mapeamento da cobertura da terra nas escalas regional e urbana Prado, Fernanda de Almeida [UNESP] 16 February 2009 (has links) (PDF) Made available in DSpace on 2014-06-11T19:22:25Z (GMT). No. of bitstreams: 0 Previous issue date: 2009-02-16Bitstream added on 2014-06-13T18:49:03Z : No. of bitstreams: 1 prado_fa_me_prud.pdf: 2522184 bytes, checksum: a4acb46054fde8c9233bacf1d5c21128 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Os mapeamentos da cobertura da terra apresentam o relevante papel de permitir avaliar as alterações na paisagem provocadas pela ação antrópica e fornecer importantes informações para o manejo eficiente dos recursos naturais constituindo-se, assim, em ferramentas essenciais para o planejamento regional e urbano. Entretanto, os atuais mapeamentos atendem a propósitos muito específicos e, conseqüentemente, são limitados na sua capacidade de definir a ampla variedade de tipos existentes de cobertura da terra. Nesse contexto, a proposta central desta pesquisa é desenvolver um sistema hierárquico de classificação, amplo e abrangente, partindo de um nível generalizado de definição de classes de cobertura da terra, para mapeamentos em escala regional, e especializando essas classes para ambientes urbanos. Para cada escala de mapeamento é proposta a nomenclatura das classes e os critérios usados para defini-las. Um estudo de caso é desenvolvido para testar o sistema hierárquico em dois níveis de detalhamento distintos, nas escalas regional e urbana, e são utilizadas diferentes abordagens de classificação multiespectral para extrair as informações temáticas de interesse a cada nível de aplicação... / Land cover mappings present the relevant role of allowing to evaluate the changes in the landscape caused by the man's action and to supply important information for the efficient handling of the natural resources. Thus, those are essential tools for the regional and urban planning. However, current mappings are related to very specific purposes and, consequently, they are limited in their capacity to define the wide variety of existent types of land cover. In that context, the main proposal of this research is to develop a wide and including hierarchical classification system, starting from a generalized level of definition of land cover classes for mappings in regional scale and specializing those classes for urban environment. It is proposed the nomenclature of the classes and the criteria used to define them for each scale of mapping. A study case is developed to test the hierarchical system in two detailing levels, in the regional and urban scales, and different approaches of multispectral classification are used to extract the thematic information of interest to each application level... (Complete abstract click electronic access below) Cartografia Hierarchical classification system Remote sensing Thematic mapping Land cover
4	Continual Object Learning Erculiani, Luca 10 June 2021 (has links) This work focuses on building frameworks to strengthen the relation between human and machine learning. This is achieved by proposing a new category of algorithms and a new theory to formalize the perception and categorizationof objects. For what concerns the algorithmic part, we developed a series of procedures to perform Interactive Continuous Open World learning from the point of view of a single user. As for humans, the input of the algorithms are continuous streams of visual information (sequences of frames), that enable the extraction of richer representations by exploiting the persistence of the same object in the input data. Our approaches are able to incrementally learn and recognize collections of objects, starting from emph{zero} knowledge, and organizing them in a hierarchy that follows the will of the user. We then present a novel Knowledge Representation theory that formalizes the property of our setting and enables the learning over it. The theory is based on the notion of separating the visual representation of objects from the semantic meaning associated with them. This distinction enables to treat both instances and classes of objects as being elements of the same kind, as well as allowing for dynamically rearranging objects according to the needs of the user. The whole framework is gradually introduced through the entire thesis and is coupled with an extensive series of experiments to demonstrate its working principles. The experiments focus also on demonstrating the role of a developmental learning policy, in which new objects are regularly introduced, enabling both an increase in recognition performance while reducing the amount of supervision provided by the user.
5	Investigação de técnicas de classificação hierárquica para problemas de bioinformática / Investigation of hierarchial classification techniques for bioinformatics problems Costa, Eduardo de Paula 25 March 2008 (has links) Em Aprendizado de Máquina e Mineração de Dados, muitos dos trabalhos de classificação reportados na literatura envolvem classificação plana (flat classification), em que cada exemplo é associado a uma dentre um conjunto finito (e normalmente pequeno) de classes, todas em um mesmo nível. Entretanto, existem problemas de classificação mais complexos em que as classes a serem preditas podem ser dispostas em uma estrutura hierárquica. Para esses problemas, a utilização de técnicas e conceitos de classificação hierárquica tem se mostrado útil. Uma das linhas de pesquisa com grande potencial para a utilização de tais técnicas é a Bioinformática. Dessa forma, esta dissertação apresenta um estudo envolvendo técnicas de classificação hierárquica aplicadas à predição de classes funcionais de proteínas. No total foram investigados doze algoritmos hierárquicos diferentes, sendo onze deles representantes da abordagem Top-Down, que foi o enfoque da investigação realizada. O outro algoritmo investigado foi o HC4.5, um algoritmo baseado na abordagem Big- Bang. Parte dos algoritmos estudados foram desenvolvidos com base em uma variação da abordagem Top-Down, denominada de Top-Down Ensemble, que foi proposta neste estudo. Alguns do algoritmos baseados nessa nova abordagem apresentaram resultados promissores, superando os resultados dos demais algoritmos. Para avaliação dos resultados, foi utilizada uma medida específica para problemas hierárquicos, denominada taxa de acerto dependente da profundidade. Além dessa, outras três medidas de avaliação foram utilizadas, de modo a comparar os resultados reportados por diferentes medidas / In Machine Learning and Data Mining, most of the research in classification reported in the literature involve flat classification, where each example is assigned to one class out of a finite (and usually small) set of flat classes. Nevertheless, there are more complex classification problems in which the classes to be predicted can be disposed in a hierarchy. In this context, the use of hierarchical classification techniques and concepts have been shown to be useful. One research with great potential is the application of hierarchical classification techniques to Bioinformatics problems. Therefore, this MSc thesis presents a study involving hierarchical classification techniques applied to the prediction of functional classes of proteins. Twelve different algorithms were investigated - eleven of them based on the Top-Down approach, which was the focus of this study. The other investigated algorithm was HC4.5, an algorithm based on the Big-Bang approach. Part of these algorithms are based on a variation of the Top-Down approach, named Top-Down Ensembles, proposed in this study. Some of the algorithms based on this new approach presented promising results, which were better than the results presented by other algorithms. A specific evaluation measure for hierarchical classification, named depth-dependent accuracy, was used to evaluate the classification models. Besides, other three evaluation measures were used in order to compare the results reported by them Aprendizado de máquina Bioinformática Bioinformatics Classificação hierárquica Data mining Hierarchical classification Machine learning Mineração de dados
6	Técnicas para o problema de dados desbalanceados em classificação hierárquica / Techniques for the problem of imbalanced data in hierarchical classification Barella, Victor Hugo 24 July 2015 (has links) Os recentes avanços da ciência e tecnologia viabilizaram o crescimento de dados em quantidade e disponibilidade. Junto com essa explosão de informações geradas, surge a necessidade de analisar dados para descobrir conhecimento novo e útil. Desse modo, áreas que visam extrair conhecimento e informações úteis de grandes conjuntos de dados se tornaram grandes oportunidades para o avanço de pesquisas, tal como o Aprendizado de Máquina (AM) e a Mineração de Dados (MD). Porém, existem algumas limitações que podem prejudicar a acurácia de alguns algoritmos tradicionais dessas áreas, por exemplo o desbalanceamento das amostras das classes de um conjunto de dados. Para mitigar tal problema, algumas alternativas têm sido alvos de pesquisas nos últimos anos, tal como o desenvolvimento de técnicas para o balanceamento artificial de dados, a modificação dos algoritmos e propostas de abordagens para dados desbalanceados. Uma área pouco explorada sob a visão do desbalanceamento de dados são os problemas de classificação hierárquica, em que as classes são organizadas em hierarquias, normalmente na forma de árvore ou DAG (Direct Acyclic Graph). O objetivo deste trabalho foi investigar as limitações e maneiras de minimizar os efeitos de dados desbalanceados em problemas de classificação hierárquica. Os experimentos realizados mostram que é necessário levar em consideração as características das classes hierárquicas para a aplicação (ou não) de técnicas para tratar problemas dados desbalanceados em classificação hierárquica. / Recent advances in science and technology have made possible the data growth in quantity and availability. Along with this explosion of generated information, there is a need to analyze data to discover new and useful knowledge. Thus, areas for extracting knowledge and useful information in large datasets have become great opportunities for the advancement of research, such as Machine Learning (ML) and Data Mining (DM). However, there are some limitations that may reduce the accuracy of some traditional algorithms of these areas, for example the imbalance of classes samples in a dataset. To mitigate this drawback, some solutions have been the target of research in recent years, such as the development of techniques for artificial balancing data, algorithm modification and new approaches for imbalanced data. An area little explored in the data imbalance vision are the problems of hierarchical classification, in which the classes are organized into hierarchies, commonly in the form of tree or DAG (Direct Acyclic Graph). The goal of this work aims at investigating the limitations and approaches to minimize the effects of imbalanced data with hierarchical classification problems. The experimental results show the need to take into account the features of hierarchical classes when deciding the application of techniques for imbalanced data in hierarchical classification. Aprendizado supervisionado Classificação hierárquica Dados desbalanceados Data imbalance Desbalanceamento de dados Hierarchical classification Imbalanced data Supervised learning
7	Prediction of Hierarchical Classification of Transposable Elements Using Machine Learning Techniques Panta, Manisha 05 August 2019 (has links) Transposable Elements (TEs) or jumping genes are the DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even promote gross genetic arrangements. Thus, the proper classification of the identified jumping genes is important to understand their genetic and evolutionary effects. While computational methods have been developed that perform either binary classification or multi-label classification of TEs, few studies have focused on their hierarchical classification. The existing methods have limited accuracy in classifying TEs. In this study, we examine the performance of a variety of machine learning (ML) methods and propose a robust augmented Stacking-based ML method, ClassifyTE, for the hierarchical classification of TEs with high accuracy. Transposable Elements Hierarchical Classification Supervised Learning Machine Learning Computer Sciences Other Computer Sciences Physical Sciences and Mathematics
8	Machine Learning Strategies for Large-scale Taxonomies / Strategies d'apprentissage pour la classification dans les grandes taxonomies Babbar, Rohit 17 October 2014 (has links) À l'ère de Big Data, le développement de modèles d'apprentissage machine efficaces et évolutifs opérant sur des Tera-Octets de données est une nécessité. Dans cette thèse, nous étudions un cadre d'apprentissage machine pour la classification hiérarchique à large échelle. Cette analyse comprend l'étude des défis comme la complexité d'entraînement des modèles ainsi que leur temps de prédiction. Dans la première partie de la thèse, nous étudions la distribution des lois de puissance sous-jacente à la création des taxonomies à grande échelle. Cette étude permet de dériver des bornes sur la complexité spatiale des classifieurs hiérarchiques. L'exploitation de ce résultat permet alors le développement des modèles efficaces pour les classes distribuées selon une loi de puissance. Nous proposons également une méthode efficace pour la sélection de modèles pour des classifieurs multi-classes de type séparateurs à vaste marge ou de la régression logistique. Dans une deuxième partie, nous étudions le problème de la classification hiérarichique contre la classification plate d'un point de vue théorique. Nous dérivons une borne sur l'erreur de généralisation qui permet de définir les cas où la classification hiérarchique serait plus avantageux que la classification plate. Nous exploitons en outre les bornes développées pour proposer deux méthodes permettant adapter une taxonomie donnée de catégories à une taxonomies de sorties qui permet d'atteindre une meilleure performance de test. / In the era of Big Data, we need efficient and scalable machine learning algorithms which can perform automatic classification of Tera-Bytes of data. In this thesis, we study the machine learning challenges for classification in large-scale taxonomies. These challenges include computational complexity of training and prediction and the performance on unseen data. In the first part of the thesis, we study the underlying power-law distribution in large-scale taxonomies. This analysis then motivates the derivation of bounds on space complexity of hierarchical classifiers. Exploiting the study of this distribution further, we then design classification scheme which leads to better accuracy on large-scale power-law distributed categories. We also propose an efficient method for model-selection when training multi-class version of classifiers such as Support Vector Machine and Logistic Regression. Finally, we address another key model selection problem in large scale classification concerning the choice between flat versus hierarchical classification from a learning theoretic aspect. The presented generalization error analysis provides an explanation to empirical findings in many recent studies in large-scale hierarchical classification. We further exploit the developed bounds to propose two methods for adapting the given taxonomy of categories to output taxonomies which yield better test accuracy when used in a top-down setup. Apprentissage automatique Classification à large échelle Classification hiérarchique Automatic Learning Large-scale Classification Hierarchical classification 004
9	Sistema hierárquico de classificação para mapeamento da cobertura da terra nas escalas regional e urbana / Prado, Fernanda de Almeida. January 2009 (has links) Orientador: Maria de Lourdes Bueno Trindade Galo / Banca: Erivaldo Antonio da Silva / Banca: Edson Eyji Sano / Resumo: Os mapeamentos da cobertura da terra apresentam o relevante papel de permitir avaliar as alterações na paisagem provocadas pela ação antrópica e fornecer importantes informações para o manejo eficiente dos recursos naturais constituindo-se, assim, em ferramentas essenciais para o planejamento regional e urbano. Entretanto, os atuais mapeamentos atendem a propósitos muito específicos e, conseqüentemente, são limitados na sua capacidade de definir a ampla variedade de tipos existentes de cobertura da terra. Nesse contexto, a proposta central desta pesquisa é desenvolver um sistema hierárquico de classificação, amplo e abrangente, partindo de um nível generalizado de definição de classes de cobertura da terra, para mapeamentos em escala regional, e especializando essas classes para ambientes urbanos. Para cada escala de mapeamento é proposta a nomenclatura das classes e os critérios usados para defini-las. Um estudo de caso é desenvolvido para testar o sistema hierárquico em dois níveis de detalhamento distintos, nas escalas regional e urbana, e são utilizadas diferentes abordagens de classificação multiespectral para extrair as informações temáticas de interesse a cada nível de aplicação... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Land cover mappings present the relevant role of allowing to evaluate the changes in the landscape caused by the man's action and to supply important information for the efficient handling of the natural resources. Thus, those are essential tools for the regional and urban planning. However, current mappings are related to very specific purposes and, consequently, they are limited in their capacity to define the wide variety of existent types of land cover. In that context, the main proposal of this research is to develop a wide and including hierarchical classification system, starting from a generalized level of definition of land cover classes for mappings in regional scale and specializing those classes for urban environment. It is proposed the nomenclature of the classes and the criteria used to define them for each scale of mapping. A study case is developed to test the hierarchical system in two detailing levels, in the regional and urban scales, and different approaches of multispectral classification are used to extract the thematic information of interest to each application level... (Complete abstract click electronic access below) / Mestre Cartografia. Hierarchical classification system. eng Remote sensing. eng Thematic mapping. eng Land cover. eng
10	Investigação de técnicas de classificação hierárquica para problemas de bioinformática / Investigation of hierarchial classification techniques for bioinformatics problems Eduardo de Paula Costa 25 March 2008 (has links) Em Aprendizado de Máquina e Mineração de Dados, muitos dos trabalhos de classificação reportados na literatura envolvem classificação plana (flat classification), em que cada exemplo é associado a uma dentre um conjunto finito (e normalmente pequeno) de classes, todas em um mesmo nível. Entretanto, existem problemas de classificação mais complexos em que as classes a serem preditas podem ser dispostas em uma estrutura hierárquica. Para esses problemas, a utilização de técnicas e conceitos de classificação hierárquica tem se mostrado útil. Uma das linhas de pesquisa com grande potencial para a utilização de tais técnicas é a Bioinformática. Dessa forma, esta dissertação apresenta um estudo envolvendo técnicas de classificação hierárquica aplicadas à predição de classes funcionais de proteínas. No total foram investigados doze algoritmos hierárquicos diferentes, sendo onze deles representantes da abordagem Top-Down, que foi o enfoque da investigação realizada. O outro algoritmo investigado foi o HC4.5, um algoritmo baseado na abordagem Big- Bang. Parte dos algoritmos estudados foram desenvolvidos com base em uma variação da abordagem Top-Down, denominada de Top-Down Ensemble, que foi proposta neste estudo. Alguns do algoritmos baseados nessa nova abordagem apresentaram resultados promissores, superando os resultados dos demais algoritmos. Para avaliação dos resultados, foi utilizada uma medida específica para problemas hierárquicos, denominada taxa de acerto dependente da profundidade. Além dessa, outras três medidas de avaliação foram utilizadas, de modo a comparar os resultados reportados por diferentes medidas / In Machine Learning and Data Mining, most of the research in classification reported in the literature involve flat classification, where each example is assigned to one class out of a finite (and usually small) set of flat classes. Nevertheless, there are more complex classification problems in which the classes to be predicted can be disposed in a hierarchy. In this context, the use of hierarchical classification techniques and concepts have been shown to be useful. One research with great potential is the application of hierarchical classification techniques to Bioinformatics problems. Therefore, this MSc thesis presents a study involving hierarchical classification techniques applied to the prediction of functional classes of proteins. Twelve different algorithms were investigated - eleven of them based on the Top-Down approach, which was the focus of this study. The other investigated algorithm was HC4.5, an algorithm based on the Big-Bang approach. Part of these algorithms are based on a variation of the Top-Down approach, named Top-Down Ensembles, proposed in this study. Some of the algorithms based on this new approach presented promising results, which were better than the results presented by other algorithms. A specific evaluation measure for hierarchical classification, named depth-dependent accuracy, was used to evaluate the classification models. Besides, other three evaluation measures were used in order to compare the results reported by them Aprendizado de máquina Bioinformática Classificação hierárquica Mineração de dados Bioinformatics Data mining Hierarchical classification Machine learning

Search results