Global ETD Search

11	Ordering Classifier Chains using filter model feature selection techniques Gustafsson, Robin January 2017 (has links) Context: Multi-label classification concerns classification with multi-dimensional output. The Classifier Chain breaks the multi-label problem into multiple binary classification problems, chaining the classifiers to exploit dependencies between labels. Consequently, its performance is influenced by the chain's order. Approaches to finding advantageous chain orders have been proposed, though they are typically costly. Objectives: This study explored the use of filter model feature selection techniques to order Classifier Chains. It examined how feature selection techniques can be adapted to evaluate label dependence, how such information can be used to select a chain order and how this affects the classifier's performance and execution time. Methods: An experiment was performed to evaluate the proposed approach. The two proposed algorithms, Forward-Oriented Chain Selection (FOCS) and Backward-Oriented Chain Selection (BOCS), were tested with three different feature evaluators. 10-fold cross-validation was performed on ten benchmark datasets. Performance was measured in accuracy, 0/1 subset accuracy and Hamming loss. Execution time was measured during chain selection, classifier training and testing. Results: Both proposed algorithms led to improved accuracy and 0/1 subset accuracy (Friedman & Hochberg, p < 0.05). FOCS also improved the Hamming loss while BOCS did not. Measured effect sizes ranged from 0.20 to 1.85 percentage points. Execution time was increased by less than 3 % in most cases. Conclusions: The results showed that the proposed approach can improve the Classifier Chain's performance at a low cost. The improvements appear similar to comparable techniques in magnitude but at a lower cost. It shows that feature selection techniques can be applied to chain ordering, demonstrates the viability of the approach and establishes FOCS and BOCS as alternatives worthy of further consideration. multi-label classification classifier chain label dependence Computer Sciences Datavetenskap (datalogi)
12	Sistemas classificadores evolutivos para problemas multirrótulo / Learning classifier system for multi-label classification Rosane Maria Maffei Vallim 27 July 2009 (has links) Classificação é, provavelmente, a tarefa mais estudada na área de Aprendizado de Máquina, possuindo aplicação em uma grande quantidade de problemas reais, como categorização de textos, diagnóstico médico, problemas de bioinformática, além de aplicações comerciais e industriais. De um modo geral, os problemas de classificação podem ser categorizados quanto ao número de rótulos de classe que podem ser associados à cada exemplo de entrada. A abordagem mais investigada pela comunidade de Aprendizado de Máquina é a de classes mutuamente exclusivas. Entretanto, existe uma grande variedade de problemas importantes em que cada exemplo de entrada pode ser associado a mais de um rótulo ou classe. Esses problemas são denominados problemas de classificação multirrótulo. Os Learning Classifier Systems(LCS) constituem uma técnica de Indução de Regras de Classificação que tem como principal mecanismo de busca um Algoritmo Genético. Essa técnica busca encontrar um conjunto de regras que tenha alta precisão de classificação, que seja compreensível e que possua regras consideradas interessantes sob o ponto de vista de classificação. Apesar de existirem na literatura diversos trabalhos sobre os LCS para problemas de classificação com classes mutuamente exclusivas, pouco se tem conhecimento sobre um LCS que seja capaz de lidar com problemas multirrótulo. Dessa maneira, o objetivo desta monografia é apresentar uma proposta de LCS para problemas multirrótulo, que pretende induzir um conjunto de regras de classificação que produza um resultado eficaz e comparável com outras técnicas de classificação. De acordo com esse objetivo, apresenta-se também uma revisão bibliográfica dos temas envolvidos na proposta, que são: Sistemas Classificadores Evolutivos e Classificação Multirrótulo / Classification is probably the most studied task in the Machine Learning area, with applications in a broad number of real problems like text categorization, medical diagnosis, bioinformatics and even comercial and industrial applications. Generally, classification problems can be categorized considering the number of class labels associated to each input instance. The most studied approach by the community of Machine Learning is the one that considers mutually exclusive classes. However, there is a large variety of important problems in which each instance can be associated to more than one class label. This problems are called multi-label classification problems. Learning Classifier Systems (LCS) are a technique for rule induction which uses a Genetic Algorithm as the primary search mechanism. This technique searchs for sets of rules that have high classification accuracy and that are also understandable and interesting on the classification point of view. Although there are several works on LCS for classification problems with mutually exclusive classes, there is no record of an LCS that can deal with the multi-label classification problem. The objective of this work is to propose an LCS for multi-label classification that builds a set of classification rules which achieves results that are efficient and comparable to other multi-label methods. In accordance with this objective this work also presents a review of the themes involved: Learning Classifier Systems and Multi-label Classification Algoritmos genéticos Classificação multirrótulo Sistemas classificadores evolutivos Genetic algorithms Learning classifier systems Multi-label classification
13	EMERGENCY MEDICAL SERVICE EMR-DRIVEN CONCEPT EXTRACTION FROM NARRATIVE TEXT Susanna S George (10947207) 05 August 2021 (has links) Being in the midst of a pandemic with patients having minor symptoms that quickly become fatal to patients with situations like a stemi heart attack, a fatal accident injury, and so on, the importance of medical research to improve speed and efficiency in patient care, has increased. As researchers in the computer domain work hard to use automation in technology in assisting the first responders in the work they do, decreasing the cognitive load on the field crew, time taken for documentation of each patient case and improving accuracy in details of a report has been a priority. <br>This paper presents an information extraction algorithm that custom engineers certain existing extraction techniques that work on the principles of natural language processing like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient. Computer Engineering concept extraction multi-label classification algorithms Natural Language Processing Syntactic Dependency
14	Multi-label classification on locally-linear data: Application to chemical toxicity prediction Yap, Xiu Huan 16 August 2021 (has links) No description available. Computer Science Toxicology Predictive Toxicology Multi-label Classification Locally-linear data Locality-sensitive deep learner attention
15	Automatická klasifikace smluv pro portál HlidacSmluv.cz / Automated contract classification for portal HlidacSmluv.cz Maroušek, Jakub January 2020 (has links) The Contracts Register is a public database containing contracts concluded by public institutions. Due to the number of documents in the database, data analysis is proble- matic. The objective of this thesis is to find a machine learning approach for sorting the contracts into categories by their area of interest (real estate services, construction, etc.) and implement the approach for usage on the web portal Hlídač státu. A large number of categories and a lack of a tagged dataset of contracts complicate the solution. 1
16	Practical Web-scale Recommender Systems / 実用的なWebスケール推薦システム / # ja-Kana Tagami, Yukihiro 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21390号 / 情博第676号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Recommender systems Online advertising Extreme multi-label classification Learning-to-rank Approximate nearest neighbor search 007
17	Deep Learning Based Multi-Label Classification of Radiotherapy Target Volumes for Prostate Cancer / Djupinlärningsbaserad ﬂer-etikett klassiﬁcering av målvolymer för prostatacancer inom strålterapi Welander, Lina January 2019 (has links) An initiative to standardize the nomenclature in Sweden started in 2016 along with the creation of the local database Medical Information Quality Archive (MIQA) and a national radiotherapy register on Information Network for CAncercare (INCA). A problem of identifying the clinical tumor volume (CTV) structures and prescribed dose arose when the consecutive number, which is added to the CTV-name, was made inconsistently in MIQA and INCA. Deep neural networks (DNN) were promising tools to solve the multi-label classiﬁcation task of the CTV to enable automatic labeling in the database. Prostate cancer patients that often have more than one type of organ in the same CTV structure were chosen for proof of concept. The DNN used supervised training in a 2D fashion where the radiation therapy (RT) structures along with the CT image were fed, slice by slice, to AlexNet and VGGNet to label the CTV structures in the local database system MIQA and INCA. The study also includes three methods to classify a ﬁnal label for the CTV structure since the model makes the predictions on each slice. The three methods were maximum method by taking the maximum prediction for each class, minimum method by taking the minimum prediction for each class and occurrence method. The occurrence method chooses the maximum prediction if the network has predicted the class over 0.5 at least two times and the minimum prediction if not. The DNN and volume classiﬁcation methods performed well where the maximum and occurrence method performed the best and can be used to interpret RT volumes in MIQA and INCA for prostate cancer patients. This novel study gives promising results for the future development of deep neural networks classifying RT structures for more than one type of cancer patient. / Ett initiativ för att standardisera nomenklaturen i Sverige startade 2016 tillsammans med skapandet av den lokala databasen Medical Information Quality Archive (MIQA) och ett nationellt radioterapikvalitetsregister på plattformen Information Network for CAncercare (INCA). Ett problem med att identiﬁera kliniska tumörvolymstrukturer (CTV-strukturer) och ordinerad dos uppstod när de på varandra följande siﬀrorna, som adderas till CTV-namnet för att skilja de olika CTV:erna från varandra, gjordes inkonsekvent i MIQA och INCA. Djupa neurala nätverk (DNN) är lovande verktyg för att lösa klassiﬁceringen av CTV för att möjliggöra automatisk annotering för multippla etiketter i databasen. Prostatacancerpatienter vars radioterapistrukturer (RT-strukturer) ofta innehåller ﬂer än ett organ användes därför för att bevisa konceptet för ﬂeretikettsklassiﬁcering. DNN:et använde övervakad inlärning av 2D-bilder där RT-strukturerna tillsammans med CT-bilderna matades in, snitt för snitt, till AlexNet och VGGNet för att namnge CTV-strukturerna i det lokala databassystemet MIQA och sedan i INCA. Studien inkluderar även tre metoder för en slutlig strukturetikett eftersom modellen gör sina förutsägelser på varje snitt. Metoderna var maximum där den högsta förutsägelsen noteras för varje klass, minimum där den lägsta förutsägelsen noteras för varje klass och förekomst där den högsta förutsägelsen noteras om klassen har fått minst två förutsägelser över 0.5 annars noteras den lägsta förutsägelsen. DNN:en och volymetikettmetoderna gav bra resultat där maximum- och förekomstmetoden gav bäst resultat och kan användas för att tolka RT-volymer i MIQA och INCA för prostatacancerpatienter. Denna nya studie ger lovande resultat för framtida utveckling av djupa neurala nätverk som klassiﬁcerar strukturer från mer än en typ av cancerpatient. Multi-label classification MIQA INCA Deep neural network Medical Image Processing Medicinsk bildbehandling
18	Natural Language Programming for Controlled Object-Oriented English Zhan, Yue 11 July 2022 (has links) Natural language (NL) is a common medium humans use to express ideas and communicate with others, while programming languages (PL) are the ``language'' humans use to communicate with machines. As NL and PL were designed for different purposes, a considerable difference exists in the structure and capabilities. Programming using PL can take novices months to learn. Meanwhile, users are already familiar with NL. Therefore, natural language programming (NLPr) holds excellent potential by giving non-experts the ability to ``program'' with the language they already know and a Low-Code/No-Code development experience. However, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate NL semantics, validate inputs and provide helpful feedback, and generate the executable programs based on semantic meanings effectively. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP to NLPr are taken: (1) A domain-specific lexicon and function library serve as the syntax and semantic space. Even though NL can be complex and expressive, functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed model and approaches reducing the ambiguity within the target domain, the NLPr system can process a relatively expressive controlled NL for robot motion control and generate executable codes based on the English input. When the system is confronted with erroneous sentences, it produces error messages, suggestions, and example sentences for users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. / Doctor of Philosophy / Natural language (NL) is one of the most common mediums humans use daily to express and explain ideas and communicate with each other. In contrast, programming languages (PL) are the ``language'' humans use to communicate with machines. Because of the difference in the purpose, media, and audience, there is a considerable difference in their structure and capabilities. NL is more expressive and natural and sometimes can be rather complex, while PL is primarily short, straightforward, and not as expressive as NL. The need for programming has increased in recent years. However, the learning curve of programming languages can easily be months or more for novice users to learn. At the same time, all potential users are familiar with at least one NL. As such, natural language programming (NLPr), a technology that enables people to program with NL, holds excellent potential since it gives non-experts the ability to ``program'' with the language they already know and a Low-Code or even No-Code development experience. However, despite recent research into NLPr, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate natural language semantics, how to validate inputs and provide helpful feedback with a limited amount of data, and how to effectively generate the executable programs based on the semantic meanings. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP techniques to NLPr are taken: (1) The first is developing a domain-specific lexicon and function library with the designed COOL model to serve as the syntax and semantic space. Even though natural language can be extremely complex and expressive, the functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed language model and approaches reducing the ambiguity within the target domain, the designed NLPr system can process a relatively expressive controlled natural language designed for robot motion control and generate executable codes based on the semantic information extracted. When the NLPr system is confronted with erroneous sentences, it produces detailed error messages and provides suggestions and sample sentences for possible fixes to users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the simple language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. Natural language programming Natural language processing Semantic extraction Multi-label classification LEGO Mindstorm EV3
19	Ensemble multi-label learning in supervised and semi-supervised settings / Apprentissage multi-label ensembliste dans le context supervisé et semi-supervisé Gharroudi, Ouadie 21 December 2017 (has links) L'apprentissage multi-label est un problème d'apprentissage supervisé où chaque instance peut être associée à plusieurs labels cibles simultanément. Il est omniprésent dans l'apprentissage automatique et apparaÃ®t naturellement dans de nombreuses applications du monde réel telles que la classification de documents, l'étiquetage automatique de musique et l'annotation d'images. Nous discutons d'abord pourquoi les algorithmes multi-label de l'etat-de-l'art utilisant un comité de modèle souffrent de certains inconvénients pratiques. Nous proposons ensuite une nouvelle stratégie pour construire et agréger les modèles ensemblistes multi-label basés sur k-labels. Nous analysons ensuite en profondeur l'effet de l'étape d'agrégation au sein des approches ensemblistes multi-label et étudions comment cette agrégation influece les performances de prédictive du modèle enfocntion de la nature de fonction cout à optimiser. Nous abordons ensuite le problème spécifique de la selection de variables dans le contexte multi-label en se basant sur le paradigme ensembliste. Trois méthodes de sélection de caractéristiques multi-label basées sur le paradigme des forêts aléatoires sont proposées. Ces méthodes diffèrent dans la façon dont elles considèrent la dépendance entre les labels dans le processus de sélection des varibales. Enfin, nous étendons les problèmes de classification et de sélection de variables au cadre d'apprentissage semi-supervisé. Nous proposons une nouvelle approche de sélection de variables multi-label semi-supervisée basée sur le paradigme de l'ensemble. Le modèle proposé associe des principes issues de la co-training en conjonction avec une métrique interne d'évaluation d'importnance des varaibles basée sur les out-of-bag. Testés de manière satisfaisante sur plusieurs données de référence, les approches développées dans cette thèse sont prometteuses pour une variété d'ap-plications dans l'apprentissage multi-label supervisé et semi-supervisé. Testés de manière satisfaisante sur plusieurs jeux de données de référence, les approches développées dans cette thèse affichent des résultats prometteurs pour une variété domaine d'applications de l'apprentissage multi-label supervisé et semi-supervisé / Multi-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning Classification multi-label Apprentissage supervisé Apprentissage semi-supervisé Multi-label classification Ensemble models Semi-supervised learning Feature selection 004
20	Abordagens para aprendizado semissupervisionado multirrótulo e hierárquico / Multi-label and hierarchical semi-supervised learning approaches Metz, Jean 25 October 2011 (has links) A tarefa de classificação em Aprendizado de Máquina consiste da criação de modelos computacionais capazes de identificar automaticamente a classe de objetos pertencentes a um domínio pré-definido a partir de um conjunto de exemplos cuja classe é conhecida. Existem alguns cenários de classificação nos quais cada objeto pode estar associado não somente a uma classe, mas a várias classes ao mesmo tempo. Adicionalmente, nesses cenários denominados multirrótulo, as classes podem ser organizadas em uma taxonomia que representa as relações de generalização e especialização entre as diferentes classes, definindo uma hierarquia de classes, o que torna a tarefa de classificação ainda mais específica, denominada classificação hierárquica. Os métodos utilizados para a construção desses modelos de classificação são complexos e dependem fortemente da disponibilidade de uma quantidade expressiva de exemplos previamente classificados. Entretanto, para muitas aplicações é difícil encontrar um número significativo desses exemplos. Além disso, com poucos exemplos, os algoritmos de aprendizado supervisionado não são capazes de construir modelos de classificação eficazes. Nesses casos, é possível utilizar métodos de aprendizado semissupervisionado, cujo objetivo é aprender as classes do domínio utilizando poucos exemplos conhecidos conjuntamente com um número considerável de exemplos sem a classe especificada. Neste trabalho são propostos, entre outros, métodos que fazem uso do aprendizado semissupervisionado baseado em desacordo coperspectiva, tanto para a tarefa de classificação multirrótulo plana quanto para a tarefa de classificação hierárquica. São propostos, também, outros métodos que utilizam o aprendizado ativo com intuito de melhorar a performance de algoritmos de classificação semissupervisionada. Além disso, são propostos dois métodos para avaliação de algoritmos multirrótulo e hierárquico, os quais definem estratégias para identificação dos multirrótulos majoritários, que são utilizados para calcular os valores baseline das medidas de avaliação. Foi desenvolvido um framework para realizar a avaliação experimental da classificação hierárquica, no qual foram implementados os métodos propostos e um módulo completo para realizar a avaliação experimental de algoritmos hierárquicos. Os métodos propostos foram avaliados e comparados empiricamente, considerando conjuntos de dados de diversos domínios. A partir da análise dos resultados observa-se que os métodos baseados em desacordo não são eficazes para tarefas de classificação complexas como multirrótulo e hierárquica. Também é observado que o problema central de degradação do modelo dos algoritmos semissupervisionados agrava-se nos casos de classificação multirrótulo e hierárquica, pois, nesses casos, há um incremento nos fatores responsáveis pela degradação nos modelos construídos utilizando aprendizado semissupervisionado baseado em desacordo coperspectiva / In machine learning, the task of classification consists on creating computational models that are able to automatically identify the class of objects belonging to a predefined domain from a set of examples whose class is known a priori. There are some classification scenarios in which each object can be associated to more than one class at the same time. Moreover, in such multilabeled scenarios, classes can be organized in a taxonomy that represents the generalization and specialization relationships among the different classes, which defines a class hierarchy, making the classification task, known as hierarchical classification, even more specific. The methods used to build such classification models are complex and highly dependent on the availability of an expressive quantity of previously classified examples. However, for a large number of applications, it is difficult to find a significant number of such examples. Moreover, when few examples are available, supervised learning algorithms are not able to build efficient classification models. In such situations it is possible to use semi-supervised learning, whose aim is to learn the classes of the domain using a few classified examples in conjunction to a considerable number of examples with no specified class. In this work, we propose methods that use the co-perspective disagreement based learning approach for both, the flat multilabel classification and the hierarchical classification tasks, among others. We also propose other methods that use active learning, aiming at improving the performance of semi-supervised learning algorithms. Additionally, two methods for the evaluation of multilabel and hierarchical learning algorithms are proposed. These methods define strategies for the identification of the majority multilabels, which are used to estimate the baseline evaluation measures. A framework for the experimental evaluation of the hierarchical classification was developed. This framework includes the implementations of the proposed methods as well as a complete module for the experimental evaluation of the hierarchical algorithms. The proposed methods were empirically evaluated considering datasets from various domains. From the analysis of the results, it can be observed that the methods based on co-perspective disagreement are not effective for complex classification tasks, such as the multilabel and hierarchical classification. It can also be observed that the main degradation problem of the models of the semi-supervised algorithms worsens for the multilabel and hierarchical classification due to the fact that, for these cases, there is an increase in the causes of the degradation of the models built using semi-supervised learning based on co-perspective disagreement Active learning Aprendizado ativo Aprendizado semissupervisionado Classificação hierárquica Classificação multirrótulo Hierarchical classification Multi-label classification Semi-supervised learning

Search results