Global ETD Search

21	EMERGENCY MEDICAL SERVICE EMR-DRIVEN CONCEPT EXTRACTION FROM NARRATIVE TEXT Susanna S George (10947207) 05 August 2021 (has links) Being in the midst of a pandemic with patients having minor symptoms that quickly become fatal to patients with situations like a stemi heart attack, a fatal accident injury, and so on, the importance of medical research to improve speed and efficiency in patient care, has increased. As researchers in the computer domain work hard to use automation in technology in assisting the first responders in the work they do, decreasing the cognitive load on the field crew, time taken for documentation of each patient case and improving accuracy in details of a report has been a priority. <br>This paper presents an information extraction algorithm that custom engineers certain existing extraction techniques that work on the principles of natural language processing like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient. Computer Engineering concept extraction multi-label classification algorithms Natural Language Processing Syntactic Dependency
22	Multi-label classification on locally-linear data: Application to chemical toxicity prediction Yap, Xiu Huan 16 August 2021 (has links) No description available. Computer Science Toxicology Predictive Toxicology Multi-label Classification Locally-linear data Locality-sensitive deep learner attention
23	Automatická klasifikace smluv pro portál HlidacSmluv.cz / Automated contract classification for portal HlidacSmluv.cz Maroušek, Jakub January 2020 (has links) The Contracts Register is a public database containing contracts concluded by public institutions. Due to the number of documents in the database, data analysis is proble- matic. The objective of this thesis is to find a machine learning approach for sorting the contracts into categories by their area of interest (real estate services, construction, etc.) and implement the approach for usage on the web portal Hlídač státu. A large number of categories and a lack of a tagged dataset of contracts complicate the solution. 1
24	Practical Web-scale Recommender Systems / 実用的なWebスケール推薦システム / # ja-Kana Tagami, Yukihiro 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21390号 / 情博第676号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Recommender systems Online advertising Extreme multi-label classification Learning-to-rank Approximate nearest neighbor search 007
25	Natural Language Programming for Controlled Object-Oriented English Zhan, Yue 11 July 2022 (has links) Natural language (NL) is a common medium humans use to express ideas and communicate with others, while programming languages (PL) are the ``language'' humans use to communicate with machines. As NL and PL were designed for different purposes, a considerable difference exists in the structure and capabilities. Programming using PL can take novices months to learn. Meanwhile, users are already familiar with NL. Therefore, natural language programming (NLPr) holds excellent potential by giving non-experts the ability to ``program'' with the language they already know and a Low-Code/No-Code development experience. However, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate NL semantics, validate inputs and provide helpful feedback, and generate the executable programs based on semantic meanings effectively. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP to NLPr are taken: (1) A domain-specific lexicon and function library serve as the syntax and semantic space. Even though NL can be complex and expressive, functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed model and approaches reducing the ambiguity within the target domain, the NLPr system can process a relatively expressive controlled NL for robot motion control and generate executable codes based on the English input. When the system is confronted with erroneous sentences, it produces error messages, suggestions, and example sentences for users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. / Doctor of Philosophy / Natural language (NL) is one of the most common mediums humans use daily to express and explain ideas and communicate with each other. In contrast, programming languages (PL) are the ``language'' humans use to communicate with machines. Because of the difference in the purpose, media, and audience, there is a considerable difference in their structure and capabilities. NL is more expressive and natural and sometimes can be rather complex, while PL is primarily short, straightforward, and not as expressive as NL. The need for programming has increased in recent years. However, the learning curve of programming languages can easily be months or more for novice users to learn. At the same time, all potential users are familiar with at least one NL. As such, natural language programming (NLPr), a technology that enables people to program with NL, holds excellent potential since it gives non-experts the ability to ``program'' with the language they already know and a Low-Code or even No-Code development experience. However, despite recent research into NLPr, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate natural language semantics, how to validate inputs and provide helpful feedback with a limited amount of data, and how to effectively generate the executable programs based on the semantic meanings. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP techniques to NLPr are taken: (1) The first is developing a domain-specific lexicon and function library with the designed COOL model to serve as the syntax and semantic space. Even though natural language can be extremely complex and expressive, the functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed language model and approaches reducing the ambiguity within the target domain, the designed NLPr system can process a relatively expressive controlled natural language designed for robot motion control and generate executable codes based on the semantic information extracted. When the NLPr system is confronted with erroneous sentences, it produces detailed error messages and provides suggestions and sample sentences for possible fixes to users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the simple language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. Natural language programming Natural language processing Semantic extraction Multi-label classification LEGO Mindstorm EV3
26	Multi-label Classification with Multiple Label Correlation Orders And Structures Posinasetty, Anusha January 2016 (has links) (PDF) Multilabel classification has attracted much interest in recent times due to the wide applicability of the problem and the challenges involved in learning a classifier for multilabeled data. A crucial aspect of multilabel classification is to discover the structure and order of correlations among labels and their effect on the quality of the classifier. In this work, we propose a structural Support Vector Machine (structural SVM) based framework which enables us to systematically investigate the importance of label correlations in multi-label classification. The proposed framework is very flexible and provides a unified approach to handle multiple correlation orders and structures in an adaptive manner and helps to effectively assess the importance of label correlations in improving the generalization performance. We perform extensive empirical evaluation on several datasets from different domains and present results on various performance metrics. Our experiments provide for the first time, interesting insights into the following questions: a) Are label correlations always beneficial in multilabel classification? b) What effect do label correlations have on multiple performance metrics typically used in multilabel classification? c) Is label correlation order significant and if so, what would be the favorable correlation order for a given dataset and a given performance metric? and d) Can we make useful suggestions on the label correlation structure? Multi Label Classification Structural Support Vector Machine Machine Learning Multiclass Classification Multi-Label Classification Algorithms Structural SVM Computer Science
27	Sparse Multiclass And Multi-Label Classifier Design For Faster Inference Bapat, Tanuja 12 1900 (has links) (PDF) Many real-world problems like hand-written digit recognition or semantic scene classiﬁcation are treated as multiclass or multi-label classiﬁcation prob-lems. Solutions to these problems using support vector machines (SVMs) are well studied in literature. In this work, we focus on building sparse max-margin classiﬁers for multiclass and multi-label classiﬁcation. Sparse representation of the resulting classiﬁer is important both from eﬃcient training and fast inference viewpoints. This is true especially when the training and test set sizes are large.Very few of the existing multiclass and multi-label classiﬁcation algorithms have given importance to controlling the sparsity of the designed classiﬁers directly. Further, these algorithms were not found to be scalable. Motivated by this, we propose new formulations for sparse multiclass and multi-label classiﬁer design and also give eﬃcient algorithms to solve them. The formulation for sparse multi-label classiﬁcation also incorporates the prior knowledge of label correlations. In both the cases, the classiﬁcation model is designed using a common set of basis vectors across all the classes. These basis vectors are greedily added to an initially empty model, to approximate the target function. The sparsity of the classiﬁer can be controlled by a user deﬁned parameter, dmax which indicates the max-imum number of common basis vectors. The computational complexity of these algorithms for multiclass and multi-label classiﬁer designisO(lk2d2 max), Where l is the number of training set examples and k is the number of classes. The inference time for the proposed multiclass and multi-label classiﬁers is O(kdmax). Numerical experiments on various real-world benchmark datasets demonstrate that the proposed algorithms result in sparse classiﬁers that require lesser number of basis vectors than required by state-of-the-art algorithms, to attain the same generalization performance. Very small value of dmax results in signiﬁcant reduction in inference time. Thus, the proposed algorithms provide useful alternatives to the existing algorithms for sparse multiclass and multi-label classiﬁer design. Artificial Intelligence Machine Learning Multiclass Classification Multi-label Classification Sparse Max-Margin Classifiers Support Vector Machine (SVM) Sparse Classifiers Computer Science
28	[pt] APRENDIZADO SEMI E AUTO-SUPERVISIONADO APLICADO À CLASSIFICAÇÃO MULTI-LABEL DE IMAGENS DE INSPEÇÕES SUBMARINAS / [en] SEMI AND SELF-SUPERVISED LEARNING APPLIED TO THE MULTI-LABEL CLASSIFICATION OF UNDERWATER INSPECTION IMAGE AMANDA LUCAS PEREIRA 11 July 2023 (has links) [pt] O segmento offshore de produção de petróleo é o principal produtor nacional desse insumo. Nesse contexto, inspeções submarinas são cruciais para a manutenção preventiva dos equipamentos, que permanecem toda a vida útil em ambiente oceânico. A partir dos dados de imagem e sensor coletados nessas inspeções, especialistas são capazes de prevenir e reparar eventuais danos. Tal processo é profundamente complexo, demorado e custoso, já que profissionais especializados têm que assistir a horas de vídeos atentos a detalhes. Neste cenário, o presente trabalho explora o uso de modelos de classificação de imagens projetados para auxiliar os especialistas a encontrarem o(s) evento(s) de interesse nos vídeos de inspeções submarinas. Esses modelos podem ser embarcados no ROV ou na plataforma para realizar inferência em tempo real, o que pode acelerar o ROV, diminuindo o tempo de inspeção e gerando uma grande redução nos custos de inspeção. No entanto, existem alguns desafios inerentes ao problema de classificação de imagens de inspeção submarina, tais como: dados rotulados balanceados são caros e escassos; presença de ruído entre os dados; alta variância intraclasse; e características físicas da água que geram certas especificidades nas imagens capturadas. Portanto, modelos supervisionados tradicionais podem não ser capazes de cumprir a tarefa. Motivado por esses desafios, busca-se solucionar o problema de classificação de imagens submarinas a partir da utilização de modelos que requerem menos supervisão durante o seu treinamento. Neste trabalho, são explorados os métodos DINO (Self-DIstillation with NO labels, auto-supervisionado) e uma nova versão multi-label proposta para o PAWS (Predicting View Assignments With Support Samples, semi-supervisionado), que chamamos de mPAWS (multi-label PAWS). Os modelos são avaliados com base em sua performance como extratores de features para o treinamento de um classificador simples, formado por uma camada densa. Nos experimentos realizados, para uma mesma arquitetura, se obteve uma performance que supera em 2.7 por cento o f1-score do equivalente supervisionado. / [en] The offshore oil production segment is the main national producer of this input. In this context, underwater inspections are crucial for the preventive maintenance of equipment, which remains in the ocean environment for its entire useful life. From the image and sensor data collected in these inspections,experts are able to prevent and repair damage. Such a process is deeply complex, time-consuming and costly, as specialized professionals have to watch hours of videos attentive to details. In this scenario, the present work explores the use of image classification models designed to help experts to find the event(s) of interest in under water inspection videos. These models can be embedded in the ROV or on the platform to perform real-time inference,which can speed up the ROV, monitor notification time, and greatly reduce verification costs. However, there are some challenges inherent to the problem of classification of images of armored submarines, such as: balanced labeled data are expensive and scarce; the presence of noise among the data; high intraclass variance; and some physical characteristics of the water that achieved certain specificities in the captured images. Therefore, traditional supervised models may not be able to fulfill the task. Motivated by these challenges, we seek to solve the underwater image classification problem using models that require less supervision during their training. In this work, they are explorers of the DINO methods (Self-Distillation with NO labels, self-supervised) anda new multi-label version proposed for PAWS (Predicting View AssignmentsWith Support Samples, semi-supervised), which we propose as mPAWS (multi-label PAWS). The models are evaluated based on their performance as features extractors for training a simple classifier, formed by a dense layer. In the experiments carried out, for the same architecture, a performance was obtained that exceeds by 2.7 percent the f1-score of the supervised equivalent. [pt] CLASSIFICACAO DE IMAGEM [pt] APRENDIZADO AUTO-SUPERVISIONADO [pt] CLASSIFICACAO MULTI-LABEL [pt] INSPECOES SUBAMRINAS [pt] APRENDIZADO SEMI-SUPERVISIONADO [en] IMAGE CLASSIFICATION [en] SELF-SUPERVISED LEARNING [en] MULTI-LABEL CLASSIFICATION [en] UNDERWATER INSPECTIONS [en] SEMI-SUPERVISED LEARNING
29	Abordagens para aprendizado semissupervisionado multirrótulo e hierárquico / Multi-label and hierarchical semi-supervised learning approaches Metz, Jean 25 October 2011 (has links) A tarefa de classificação em Aprendizado de Máquina consiste da criação de modelos computacionais capazes de identificar automaticamente a classe de objetos pertencentes a um domínio pré-definido a partir de um conjunto de exemplos cuja classe é conhecida. Existem alguns cenários de classificação nos quais cada objeto pode estar associado não somente a uma classe, mas a várias classes ao mesmo tempo. Adicionalmente, nesses cenários denominados multirrótulo, as classes podem ser organizadas em uma taxonomia que representa as relações de generalização e especialização entre as diferentes classes, definindo uma hierarquia de classes, o que torna a tarefa de classificação ainda mais específica, denominada classificação hierárquica. Os métodos utilizados para a construção desses modelos de classificação são complexos e dependem fortemente da disponibilidade de uma quantidade expressiva de exemplos previamente classificados. Entretanto, para muitas aplicações é difícil encontrar um número significativo desses exemplos. Além disso, com poucos exemplos, os algoritmos de aprendizado supervisionado não são capazes de construir modelos de classificação eficazes. Nesses casos, é possível utilizar métodos de aprendizado semissupervisionado, cujo objetivo é aprender as classes do domínio utilizando poucos exemplos conhecidos conjuntamente com um número considerável de exemplos sem a classe especificada. Neste trabalho são propostos, entre outros, métodos que fazem uso do aprendizado semissupervisionado baseado em desacordo coperspectiva, tanto para a tarefa de classificação multirrótulo plana quanto para a tarefa de classificação hierárquica. São propostos, também, outros métodos que utilizam o aprendizado ativo com intuito de melhorar a performance de algoritmos de classificação semissupervisionada. Além disso, são propostos dois métodos para avaliação de algoritmos multirrótulo e hierárquico, os quais definem estratégias para identificação dos multirrótulos majoritários, que são utilizados para calcular os valores baseline das medidas de avaliação. Foi desenvolvido um framework para realizar a avaliação experimental da classificação hierárquica, no qual foram implementados os métodos propostos e um módulo completo para realizar a avaliação experimental de algoritmos hierárquicos. Os métodos propostos foram avaliados e comparados empiricamente, considerando conjuntos de dados de diversos domínios. A partir da análise dos resultados observa-se que os métodos baseados em desacordo não são eficazes para tarefas de classificação complexas como multirrótulo e hierárquica. Também é observado que o problema central de degradação do modelo dos algoritmos semissupervisionados agrava-se nos casos de classificação multirrótulo e hierárquica, pois, nesses casos, há um incremento nos fatores responsáveis pela degradação nos modelos construídos utilizando aprendizado semissupervisionado baseado em desacordo coperspectiva / In machine learning, the task of classification consists on creating computational models that are able to automatically identify the class of objects belonging to a predefined domain from a set of examples whose class is known a priori. There are some classification scenarios in which each object can be associated to more than one class at the same time. Moreover, in such multilabeled scenarios, classes can be organized in a taxonomy that represents the generalization and specialization relationships among the different classes, which defines a class hierarchy, making the classification task, known as hierarchical classification, even more specific. The methods used to build such classification models are complex and highly dependent on the availability of an expressive quantity of previously classified examples. However, for a large number of applications, it is difficult to find a significant number of such examples. Moreover, when few examples are available, supervised learning algorithms are not able to build efficient classification models. In such situations it is possible to use semi-supervised learning, whose aim is to learn the classes of the domain using a few classified examples in conjunction to a considerable number of examples with no specified class. In this work, we propose methods that use the co-perspective disagreement based learning approach for both, the flat multilabel classification and the hierarchical classification tasks, among others. We also propose other methods that use active learning, aiming at improving the performance of semi-supervised learning algorithms. Additionally, two methods for the evaluation of multilabel and hierarchical learning algorithms are proposed. These methods define strategies for the identification of the majority multilabels, which are used to estimate the baseline evaluation measures. A framework for the experimental evaluation of the hierarchical classification was developed. This framework includes the implementations of the proposed methods as well as a complete module for the experimental evaluation of the hierarchical algorithms. The proposed methods were empirically evaluated considering datasets from various domains. From the analysis of the results, it can be observed that the methods based on co-perspective disagreement are not effective for complex classification tasks, such as the multilabel and hierarchical classification. It can also be observed that the main degradation problem of the models of the semi-supervised algorithms worsens for the multilabel and hierarchical classification due to the fact that, for these cases, there is an increase in the causes of the degradation of the models built using semi-supervised learning based on co-perspective disagreement Active learning Aprendizado ativo Aprendizado semissupervisionado Classificação hierárquica Classificação multirrótulo Hierarchical classification Multi-label classification Semi-supervised learning
30	Uma adaptação do método Binary Relevance utilizando árvores de decisão para problemas de classificação multirrótulo aplicado à genômica funcional / An Adaptation of Binary Relevance for Multi-Label Classification applied to Functional Genomics Tanaka, Erica Akemi 30 August 2013 (has links) Muitos problemas de classificação descritos na literatura de aprendizado de máquina e mineração de dados dizem respeito à classificação em que cada exemplo pertence a um único rótulo. Porém, vários problemas de classificação, principalmente no campo de Bioinformática são associados a mais de um rótulo; esses problemas são conhecidos como problemas de classificação multirrótulo. O princípio básico da classificação multirrótulo é similar ao da classificação tradicional (que possui um único rótulo), sendo diferenciada no número de rótulos a serem preditos, na qual há dois ou mais rótulos. Na área da Bioinformática muitos problemas são compostos por uma grande quantidade de rótulos em que cada exemplo pode estar associado. Porém, algoritmos de classificação tradicionais são incapazes de lidar com um conjunto de exemplos mutirrótulo, uma vez que esses algoritmos foram projetados para predizer um único rótulo. Uma solução mais simples é utilizar o método conhecido como método Binary Relevance. Porém, estudos mostraram que tal abordagem não constitui uma boa solução para o problema da classificação multirrótulo, pois cada classe é tratada individualmente, ignorando as possíveis relações entre elas. Dessa maneira, o objetivo dessa pesquisa foi propor uma nova adaptação do método Binary Relevance que leva em consideração relações entre os rótulos para tentar minimizar sua desvantagem, além de também considerar a capacidade de interpretabilidade do modelo gerado, não só o desempenho. Os resultados experimentais mostraram que esse novo método é capaz de gerar árvores que relacionam os rótulos correlacionados e também possui um desempenho comparável ao de outros métodos, obtendo bons resultados usando a medida-F. / Many classification problems described in the literature on Machine Learning and Data Mining relate to the classification in which each example belongs to a single class. However, many classification problems, especially in the field of Bioinformatics, are associated with more than one class; these problems are known as multi-label classification problems. The basic principle of multi-label classification is similar to the traditional classification (single label), and distinguished by the number of classes to be predicted, in this case, in which there are two or more labels. In Bioinformatics many problems are composed of a large number of labels that can be associated with each example. However, traditional classification algorithms are unable to cope with a set of multi-label examples, since these algorithms are designed to predict a single label. A simpler solution is to use the method known as Binary Relevance. However, studies have shown that this approach is not a good solution to the problem of multi-label classification because each class is treated individually, ignoring possible relations between them. Thus, the objective of this research was to propose a new adaptation of Binary Relevance method that took into account relations between labels trying to minimize its disadvantage, and also consider the ability of interpretability of the model generated, not just its performance. The experimental results show that this new method is capable of generating trees that relate labels and also has a performance comparable to other methods, obtaining good results using F-measure. Aprendizado de Maquina Árvores de Decisão Classificação Multirrótulo Decision Tree Funcional Genomic Genômica FUncional Machine Learning Multi-Label Classification

Search results