Spelling suggestions: "subject:"[een] MULTI-LABEL CLASSIFICATION"" "subject:"[enn] MULTI-LABEL CLASSIFICATION""
1 |
Text Document Categorization by Machine LearningSendur, Zeynel 01 January 2008 (has links)
Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.
|
2 |
A Common Misconception in Multi-Label LearningBrodie, Michael Benjamin 01 November 2016 (has links)
The majority of current multi-label classification research focuses on learning dependency structures among output labels. This paper provides a novel theoretical view on the purported assumption that effective multi-label classification models must exploit output dependencies. We submit that the flurry of recent dependency-exploiting, multi-label algorithms may stem from the deficiencies in existing datasets, rather than an inherent need to better model dependencies. We introduce a novel categorization of multi-label metrics, namely, evenly and unevenly weighted label metrics. We explore specific features that predispose datasets to improved classification by methods that model label dependence. Additionally, we provide an empirical analysis of 15 benchmark datasets, 1 real-life dataset, and a variety of synthetic datasets. We assert that binary relevance (BR) yields similar, if not better, results than dependency-exploiting models for metrics with evenly weighted label contributions. We qualify this claim with discussions on specific characteristics of datasets and models that render negligible the differences between BR and dependency-learning models.
|
3 |
Emergency Medical Service EMR-Driven Concept Extraction From Narrative TextGeorge, Susanna Serene 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Being in the midst of a pandemic with patients having minor symptoms that quickly
become fatal to patients with situations like a stemi heart attack, a fatal accident injury,
and so on, the importance of medical research to improve speed and efficiency in patient
care, has increased. As researchers in the computer domain work hard to use automation
in technology in assisting the first responders in the work they do, decreasing the cognitive
load on the field crew, time taken for documentation of each patient case and improving
accuracy in details of a report has been a priority.
This paper presents an information extraction algorithm that custom engineers certain
existing extraction techniques that work on the principles of natural language processing
like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient.
|
4 |
Improving Multi-label Classification by Avoiding Implicit Negativity with Incomplete DataHeath, Derrall L. 11 October 2011 (has links) (PDF)
Many real world problems require multi-label classification, in which each training instance is associated with a set of labels. There are many existing learning algorithms for multi-label classification; however, these algorithms assume implicit negativity, where missing labels in the training data are automatically assumed to be negative. Additionally, many of the existing algorithms do not handle incremental learning in which new labels could be encountered later in the learning process. A novel multi-label adaptation of the backpropagation algorithm is proposed that does not assume implicit negativity. In addition, this algorithm can, using a naive Bayesian approach, infer missing labels in the training data. This algorithm can also be trained incrementally as it dynamically considers new labels. This solution is compared with existing multi-label algorithms using data sets from multiple domains and the performance is measured with standard multi-label evaluation metrics. It is shown that our algorithm improves classification performance for all metrics by an overall average of 7.4% when at least 40% of the labels are missing from the training data, and improves by 18.4% when at least 90% of the labels are missing.
|
5 |
Multi-label Classification and Sentiment Analysis on Textual RecordsGuo, Xintong January 2019 (has links)
In this thesis we have present effective approaches for two classic Nature Language Processing tasks: Multi-label Text Classification(MLTC) and Sentiment Analysis(SA) based on two datasets.
For MLTC, a robust deep learning approach based on convolution neural network(CNN) has been introduced. We have done this on almost one million records with a related label list consists of 20 labels. We have divided our data set into three parts, training set, validation set and test set. Our CNN based model achieved great result measured in F1 score. For SA, data set was more informative and well-structured compared with MLTC. A traditional word embedding method, Word2Vec was used for generating word vector of each text records. Following that, we employed several classic deep learning models such as Bi-LSTM, RCNN, Attention mechanism and CNN to extract sentiment features. In the next step, a classification frame was designed to graded. At last, the start-of-art language model, BERT which use transfer learning method was employed.
In conclusion, we compared performance of RNN-based model, CNN-based model and pre-trained language model on classification task and discuss their applicability. / Thesis / Master of Science in Electrical and Computer Engineering (MSECE) / This theis purposed two deep learning solution to both multi-label classification problem and sentiment analysis problem.
|
6 |
Investigando a combina??o de t?cnicas de aprendizado semissupervisionado e classifica??o hier?rquica multirr?tuloSantos, Araken de Medeiros 25 May 2012 (has links)
Made available in DSpace on 2015-03-03T15:48:39Z (GMT). No. of bitstreams: 1
ArakenMS_TESE.pdf: 4060697 bytes, checksum: 5efe25ac134a602cc32c96b66e749ea0 (MD5)
Previous issue date: 2012-05-25 / Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional
problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised
learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples
is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work
is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An
experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two
approaches were statistically similar results / A classifica??o de dados ? uma tarefa com alta aplicabilidade em uma grande quantidade de dom?nios. A maioria dos m?todos para tratar problemas de classifica??o encontrados na literatura, tratam problemas tradicionais ou unirr?tulo. Nos ?ltimos anos vem sendo identificada uma s?rie de tarefas de classifica??o nas quais os exemplos podem ser rotulados a mais de uma classe simultaneamente (classifica??o multirr?tulo). Adicionalmente, tais
classes podem estar hierarquicamente organizadas (classifica??o hier?rquica e classifica??o hier?rquica multirr?tulo). Por outro lado, tem-se estudado tamb?m uma nova categoria de
aprendizado, chamada de aprendizado semissupervisionado, que combina dados rotulados (aprendizado supervisionado) e dados n?o-rotulados (aprendizado n?o-supervisionado), durante
a fase de treinamento, reduzindo, assim, a necessidade de uma grande quantidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados est? dispon?-
vel. Desse modo, uma vez que tanto as t?cnicas de classifica??o multirr?tulo e hier?rquica multirr?tulo quanto o aprendizado semissupervisionado vem apresentando resultados favor
?veis ? sua utiliza??o, neste trabalho ? proposta e utilizada a aplica??o de aprendizado semissupervisionado em tarefas de classifica??o hier?rquica multirr?tulo, de modo a se atender eficientemente as principais necessidades das duas ?reas. Uma an?lise experimental dos m?todos propostos verificou que a utiliza??o do aprendizado semissupervisionado em m?todos de classifica??o hier?rquica multirr?tulo apresentou resultados satisfat?rios, uma vez que as duas abordagens apresentaram resultados estatisticamente semelhantes
|
7 |
Sistemas classificadores evolutivos para problemas multirrótulo / Learning classifier system for multi-label classificationVallim, Rosane Maria Maffei 27 July 2009 (has links)
Classificação é, provavelmente, a tarefa mais estudada na área de Aprendizado de Máquina, possuindo aplicação em uma grande quantidade de problemas reais, como categorização de textos, diagnóstico médico, problemas de bioinformática, além de aplicações comerciais e industriais. De um modo geral, os problemas de classificação podem ser categorizados quanto ao número de rótulos de classe que podem ser associados à cada exemplo de entrada. A abordagem mais investigada pela comunidade de Aprendizado de Máquina é a de classes mutuamente exclusivas. Entretanto, existe uma grande variedade de problemas importantes em que cada exemplo de entrada pode ser associado a mais de um rótulo ou classe. Esses problemas são denominados problemas de classificação multirrótulo. Os Learning Classifier Systems(LCS) constituem uma técnica de Indução de Regras de Classificação que tem como principal mecanismo de busca um Algoritmo Genético. Essa técnica busca encontrar um conjunto de regras que tenha alta precisão de classificação, que seja compreensível e que possua regras consideradas interessantes sob o ponto de vista de classificação. Apesar de existirem na literatura diversos trabalhos sobre os LCS para problemas de classificação com classes mutuamente exclusivas, pouco se tem conhecimento sobre um LCS que seja capaz de lidar com problemas multirrótulo. Dessa maneira, o objetivo desta monografia é apresentar uma proposta de LCS para problemas multirrótulo, que pretende induzir um conjunto de regras de classificação que produza um resultado eficaz e comparável com outras técnicas de classificação. De acordo com esse objetivo, apresenta-se também uma revisão bibliográfica dos temas envolvidos na proposta, que são: Sistemas Classificadores Evolutivos e Classificação Multirrótulo / Classification is probably the most studied task in the Machine Learning area, with applications in a broad number of real problems like text categorization, medical diagnosis, bioinformatics and even comercial and industrial applications. Generally, classification problems can be categorized considering the number of class labels associated to each input instance. The most studied approach by the community of Machine Learning is the one that considers mutually exclusive classes. However, there is a large variety of important problems in which each instance can be associated to more than one class label. This problems are called multi-label classification problems. Learning Classifier Systems (LCS) are a technique for rule induction which uses a Genetic Algorithm as the primary search mechanism. This technique searchs for sets of rules that have high classification accuracy and that are also understandable and interesting on the classification point of view. Although there are several works on LCS for classification problems with mutually exclusive classes, there is no record of an LCS that can deal with the multi-label classification problem. The objective of this work is to propose an LCS for multi-label classification that builds a set of classification rules which achieves results that are efficient and comparable to other multi-label methods. In accordance with this objective this work also presents a review of the themes involved: Learning Classifier Systems and Multi-label Classification
|
8 |
Sistemas classificadores evolutivos para problemas multirrótulo / Learning classifier system for multi-label classificationRosane Maria Maffei Vallim 27 July 2009 (has links)
Classificação é, provavelmente, a tarefa mais estudada na área de Aprendizado de Máquina, possuindo aplicação em uma grande quantidade de problemas reais, como categorização de textos, diagnóstico médico, problemas de bioinformática, além de aplicações comerciais e industriais. De um modo geral, os problemas de classificação podem ser categorizados quanto ao número de rótulos de classe que podem ser associados à cada exemplo de entrada. A abordagem mais investigada pela comunidade de Aprendizado de Máquina é a de classes mutuamente exclusivas. Entretanto, existe uma grande variedade de problemas importantes em que cada exemplo de entrada pode ser associado a mais de um rótulo ou classe. Esses problemas são denominados problemas de classificação multirrótulo. Os Learning Classifier Systems(LCS) constituem uma técnica de Indução de Regras de Classificação que tem como principal mecanismo de busca um Algoritmo Genético. Essa técnica busca encontrar um conjunto de regras que tenha alta precisão de classificação, que seja compreensível e que possua regras consideradas interessantes sob o ponto de vista de classificação. Apesar de existirem na literatura diversos trabalhos sobre os LCS para problemas de classificação com classes mutuamente exclusivas, pouco se tem conhecimento sobre um LCS que seja capaz de lidar com problemas multirrótulo. Dessa maneira, o objetivo desta monografia é apresentar uma proposta de LCS para problemas multirrótulo, que pretende induzir um conjunto de regras de classificação que produza um resultado eficaz e comparável com outras técnicas de classificação. De acordo com esse objetivo, apresenta-se também uma revisão bibliográfica dos temas envolvidos na proposta, que são: Sistemas Classificadores Evolutivos e Classificação Multirrótulo / Classification is probably the most studied task in the Machine Learning area, with applications in a broad number of real problems like text categorization, medical diagnosis, bioinformatics and even comercial and industrial applications. Generally, classification problems can be categorized considering the number of class labels associated to each input instance. The most studied approach by the community of Machine Learning is the one that considers mutually exclusive classes. However, there is a large variety of important problems in which each instance can be associated to more than one class label. This problems are called multi-label classification problems. Learning Classifier Systems (LCS) are a technique for rule induction which uses a Genetic Algorithm as the primary search mechanism. This technique searchs for sets of rules that have high classification accuracy and that are also understandable and interesting on the classification point of view. Although there are several works on LCS for classification problems with mutually exclusive classes, there is no record of an LCS that can deal with the multi-label classification problem. The objective of this work is to propose an LCS for multi-label classification that builds a set of classification rules which achieves results that are efficient and comparable to other multi-label methods. In accordance with this objective this work also presents a review of the themes involved: Learning Classifier Systems and Multi-label Classification
|
9 |
EMERGENCY MEDICAL SERVICE EMR-DRIVEN CONCEPT EXTRACTION FROM NARRATIVE TEXTSusanna S George (10947207) 05 August 2021 (has links)
Being in the midst of a pandemic with patients having minor symptoms that quickly
become fatal to patients with situations like a stemi heart attack, a fatal accident injury,
and so on, the importance of medical research to improve speed and efficiency in patient
care, has increased. As researchers in the computer domain work hard to use automation
in technology in assisting the first responders in the work they do, decreasing the cognitive
load on the field crew, time taken for documentation of each patient case and improving
accuracy in details of a report has been a priority.
<br>This paper presents an information extraction algorithm that custom engineers certain
existing extraction techniques that work on the principles of natural language processing
like metamap along with syntactic dependency parser like spacy for analyzing the sentence
structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields
of an EMR form which could be reviewed and modified manually if needed. This report can
then be reused for various medical and billing purposes related to the patient.
|
10 |
Multi-label classification on locally-linear data: Application to chemical toxicity predictionYap, Xiu Huan 16 August 2021 (has links)
No description available.
|
Page generated in 0.0365 seconds