Spelling suggestions: "subject:"coreference desolution"" "subject:"coreference cesolution""
11 |
Applying Coreference Resolution for Usage in Dialog SystemsRolih, Gabi January 2018 (has links)
Using references in language is a major part of communication, and understanding them is not a challenge for humans. Recent years have seen increased usage of dialog systems that interact with humans in natural language to assist them in various tasks, but even the most sophisticated systems still struggle with understanding references. In this thesis, we adapt a coreference resolution system for usage in dialog systems and try to understand what is needed for an efficient understanding of references in dialog systems. We annotate a portion of logs from a customer service system and perform an analysis of the most common coreferring expressions appearing in this type of data. This analysis shows that most coreferring expressions are nominal and pronominal, and they usually appear within two sentences of each other. We implement Stanford's Multi-Pass Sieve with some adaptations and dialog-specific changes and integrate it into a dialog system framework. The preprocessing pipeline makes use of already existing NLP-tools, while some new ones are added, such as a chunker, a head-finding algorithm and a NER-like system. To analyze both user input and output of the system, we deploy two separate coreference resolution systems that interact with each other. An evaluation is performed on the system and its separate parts in five most common evaluation metrics. The system does not achieve state-of-the art numbers, but because of its domain-specific nature that is expected. Some parts of the system do not have any effect on the performance, while the dialog-specific changes contribute to it greatly. An error analysis is concluded and reveals some problems with the implementation, but more importantly, it shows how the system could be further improved by using other types of knowledge and dialog-specific features.
|
12 |
Event Centric Approaches in Natural Language Processing / 自然言語処理におけるイベント中心のアプローチHuang, Yin Jou 26 July 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23438号 / 情博第768号 / 新制||情||131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 河原 達也, 教授 伊藤 孝行 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
13 |
Extracting Clinical Event Timelines : Temporal Information Extraction and Coreference Resolution in Electronic Health Records / Création de Chronologies d'Événements Médicaux : Extraction d'Informations Temporelles et Résolution de la Coréférence dans les Dossiers Patients ÉlectroniquesTourille, Julien 18 December 2018 (has links)
Les dossiers patients électroniques contiennent des informations importantes pour la santé publique. La majeure partie de ces informations est contenue dans des documents rédigés en langue naturelle. Bien que le texte texte soit pertinent pour décrire des concepts médicaux complexes, il est difficile d'utiliser cette source de données pour l'aide à la décision, la recherche clinique ou l'analyse statistique.Parmi toutes les informations cliniques intéressantes présentes dans ces dossiers, la chronologie médicale du patient est l'une des plus importantes. Être capable d'extraire automatiquement cette chronologie permettrait d'acquérir une meilleure connaissance de certains phénomènes cliniques tels que la progression des maladies et les effets à long-terme des médicaments. De plus, cela permettrait d'améliorer la qualité des systèmes de question--réponse et de prédiction de résultats cliniques. Par ailleurs, accéder aux chronologiesmédicales est nécessaire pour évaluer la qualité du parcours de soins en le comparant aux recommandations officielles et pour mettre en lumière les étapes de ce parcours auxquelles une attention particulière doit être portée.Dans notre thèse, nous nous concentrons sur la création de ces chronologies médicales en abordant deux questions connexes en traitement automatique des langues: l'extraction d'informations temporelles et la résolution de la coréférence dans des documents cliniques.Concernant l'extraction d'informations temporelles, nous présentons une approche générique pour l'extraction de relations temporelles basée sur des traits catégoriels. Cette approche peut être appliquée sur des documents écrits en anglais ou en français. Puis, nous décrivons une approche neuronale pour l'extraction d'informations temporelles qui inclut des traits catégoriels.La deuxième partie de notre thèse porte sur la résolution de la coréférence. Nous décrivons une approche neuronale pour la résolution de la coréférence dans les documents cliniques. Nous menons une étude empirique visant à mesurer l'effet de différents composants neuronaux, tels que les mécanismes d'attention ou les représentations au niveau des caractères, sur la performance de notre approche. / Important information for public health is contained within Electronic Health Records (EHRs). The vast majority of clinical data available in these records takes the form of narratives written in natural language. Although free text is convenient to describe complex medical concepts, it is difficult to use for medical decision support, clinical research or statistical analysis.Among all the clinical aspects that are of interest in these records, the patient timeline is one of the most important. Being able to retrieve clinical timelines would allow for a better understanding of some clinical phenomena such as disease progression and longitudinal effects of medications. It would also allow to improve medical question answering and clinical outcome prediction systems. Accessing the clinical timeline is needed to evaluate the quality of the healthcare pathway by comparing it to clinical guidelines, and to highlight the steps of the pathway where specific care should be provided.In this thesis, we focus on building such timelines by addressing two related natural language processing topics which are temporal information extraction and clinical event coreference resolution.Our main contributions include a generic feature-based approach for temporal relation extraction that can be applied to documents written in English and in French. We devise a neural based approach for temporal information extraction which includes categorical features.We present a neural entity-based approach for coreference resolution in clinical narratives. We perform an empirical study to evaluate how categorical features and neural network components such as attention mechanisms and token character-level representations influence the performance of our coreference resolution approach.
|
14 |
Coreference Resolution for Swedish / Koreferenslösning för svenskaVällfors, Lisa January 2022 (has links)
This report explores possible avenues for developing coreference resolution methods for Swedish. Coreference resolution is an important topic within natural language processing, as it is used as a preprocessing step in various information extraction tasks. The topic has been studied extensively for English, but much less so for smaller languages such as Swedish. In this report we adapt two coreference resolution algorithms that were originally used for English, for use on Swedish texts. One algorithm is entirely rule-based, while the other uses machine learning. We have also annotated a Swedish dataset to be used for training and evaluation. Both algorithms showed promising results and as none clearly outperformed the other we can conclude that both would be good candidates for further development. For the rule-based algorithm more advanced rules, especially ones that could incorporate some semantic knowledge, was identified as the most important avenue of improvement. For the machine learning algorithm more training data would likely be the most beneficial. For both algorithms improved detection of mention spans would also help, as this was identified as one of the most error-prone components. / I denna rapport undersöks möjliga metoder för koreferenslösning för svenska. Koreferenslösning är en viktig uppgift inom språkteknologi, eftersom det utgör ett första steg i många typer av informationsextraktion. Uppgiften har studerats utförligt för flera större språk, framförallt engelska, men är ännu relativt outforskad för svenska och andra mindre språk. I denna rapport har vi anpassat två algoritmer som ursprungligen utvecklades för engelska för användning på svensk text. Den ena algoritmen bygger på maskininlärning och den andra är helt regelbaserad. Vi har också annoterat delar av Talbankens korpus med koreferensrelationer, för att användas för träning och utvärdering av koreferenslösningsalgoritmer. Båda algoritmerna visade lovande resultat, och ingen var tydligt bättre än den andra. Bägge vore därför lämpliga alternativ för vidareutveckling. För ML-algoritmen vore mer träningsdata den viktigaste punkten för förbättring, medan den regelbaserade algoritmen skulle kunna förbättras med mer komplexa regler, för att inkorporera exempelvis semantisk information i besluten. Ett annat viktigt utvecklingsområde är identifieringen av de fraser som utvärderas för möjlig koreferens, eftersom detta steg introducerade många fel i bägge algoritmerna.
|
15 |
[en] COREFERENCE RESOLUTION USING LATENT TREES WITH CONTEXTUAL EMBEDDING / [pt] RESOLUÇÃO DE CORREFERÊNCIA UTILIZANDO ÁRVORES LATENTES COM REPRESENTAÇÃO CONTEXTUALLEONARDO BARBOSA DE OLIVEIRA 19 January 2021 (has links)
[pt] A tarefa de resolução de correferência consiste em identificar e agrupar
trechos de um texto de acordo com as entidades do mundo real a que se
referem. Apesar de já ter sido abordada em outras conferências, a CoNLL
de 2012 é um marco pela qualidade das bases de dados, das métricas e
das soluções apresentadas. Naquela edição, o modelo vencedor utilizou um
perceptron estruturado para otimizar uma árvore latente de antecedentes,
atingindo a pontuação de 63.4 na métrica oficial para o dataset de teste em
inglês. Nos anos seguintes, as bases e métricas apresentadas na conferência
se tornaram o benchmark para a tarefa de correferência. Com novas técnicas
de aprendizado de máquina desenvolvidas, soluções mais elaboradas foram
apresentadas. A utilização de redes neurais rasas atingiu a pontuação de 68.8;
a adição de representação contextual elevou o estado da arte para 73.0; redes
neurais profundas melhoraram o baseline para 76.9 e o estado da arte atual,
que é uma combinação de várias dessas técnicas, está em 79.6. Neste trabalho
é apresentado uma análise de como as técnicas de representação de palavras
Bag of Words, GloVe, BERT e SpanBERT utilizadas com árvores latentes de
antecedentes se comparam com o modelo original de 2012. O melhor modelo
encontrado foi o que utiliza SpanBERT com uma margem muito larga, o qual
atingiu pontuação de 61.3 na métrica da CoNLL 2012, utilizando o dataset
de teste. Com estes resultados, mostramos que é possível utilizar técnicas
avançadas em estruturas mais simples e ainda obter resultados competitivos
na tarefa de correferência. Além disso, melhoramos a performance de um
framework de código aberto para correferência, a fim de contemplar soluções
com maior demanda de memória e processamento. / [en] The coreference resolution task consists of to identify and group spans of
text related to the same real-world entity. Although it has been approached
in other conferences, the 2012 CoNLL is a milestone due to the improvement
in the quality of its dataset, metrics, and the presented solutions. In that
edition, the winning model used a structured perceptron to optimize an
antecedent latent tree, achieving 63.4 on the official metric for the English
test dataset. During the following years, the metrics and dataset presented
in that conference became the benchmark for the coreference task. With new
machine learning techniques, more elaborated solutions were presented. The
use of shallow neural networks achieved 68.8; adding contextual representation
raised the state-of-the-art to 73.0; deep neural networks improved the baseline
to 76.9 and the current state-of-the-art, which is a combination of many of
these techniques, is at 79.6. This work presents an analysis of how the word
embedding mechanisms Bag of Words, GloVe, BERT and SpanBERT, used
with antecedent latent trees, are compared to the original model of 2012. The
best model found used SpanBERT with a very large margin, achieving 61.3 in
the CoNLL 2012 metric using the test dataset. With these results, we show
that it is possible to use advanced techniques in simpler structures and still
achieve competitive results in the coreference task. Besides that, we improved
the performance of an open source framework for coreference, so it can manage
solution that demand more memory and processing.
|
16 |
[en] COREFERENCE RESOLUTION FOR THE ENGLISH LANGUAGE / [pt] RESOLUÇÃO DE CO-REFERÊNCIA PARA A LÍNGUA INGLESAADRIEL GARCIA HERNANDEZ 28 July 2017 (has links)
[pt] Um dos problemas encontrados nos sistemas de processamento de linguagem natural é a dificuldade em identificar elementos textuais que se referem à mesma entidade. Este fenômeno é chamado de correferência. Resolver esse problema é parte integrante da compreensão do discurso, permitindo que os usuários da linguagem conectem as partes da informação de fala relativas à mesma entidade. Por conseguinte, a resolução de correferência é um importante foco de atenção no processamento da linguagem natural.Apesar da riqueza das pesquisas existentes, o desempenho atual dos sistemas de resolução de correferência ainda não atingiu um nível satisfatório. Neste trabalho, descrevemos um sistema de aprendizado estruturado para resolução de correferências em restrições que explora duas técnicas: árvores de correferência latente e indução automática de atributos guiadas por entropia. A modelagem de árvore latente torna o problema de aprendizagem computacionalmente viável porque incorpora uma estrutura escondida relevante. Além disso, utilizando um método automático de indução de recursos, podemos construir eficientemente modelos não-lineares, usando algoritmos de aprendizado de modelo linear como, por exemplo, o algoritmo de perceptron estruturado e esparso.Nós avaliamos o sistema para textos em inglês, utilizando o conjunto de dados da CoNLL-2012 Shared Task. Para a língua inglesa, nosso sistema obteve um valor de 62.24 por cento no score oficial dessa competição. Este resultado está abaixo do desempenho no estado da arte para esta tarefa que é de 65.73 por cento. No entanto, nossa solução reduz significativamente o tempo de obtenção dos clusters dos documentos, pois, nosso sistema leva 0.35 segundos por documento no conjunto de testes, enquanto no estado da arte, leva 5 segundos para cada um. / [en] One of the problems found in natural language processing systems, is the difficulty to identify textual elements referring to the same entity, this task is called coreference. Solving this problem is an integral part of discourse comprehension since it allows language users to connect the pieces of speech information concerning to the same entity. Consequently, coreference resolution is a key task in natural language processing.Despite the large efforts of existing research, the current performance of coreference resolution systems has not reached a satisfactory level yet. In this work, we describe a structure learning system for unrestricted coreferencere solution that explores two techniques: latent coreference trees and automatic entropy-guided feature induction. The latent tree modeling makes the learning problem computationally feasible,since it incorporates are levant hidden structure. Additionally,using an automatic feature induction method, we can efciently build enhanced non-linear models using linear model learning algorithms, namely, the structure dandsparse perceptron algorithm. We evaluate the system on the CoNLL-2012 Shared Task closed track data set, for the English portion. The proposed system obtains a 62.24 per cent value on the competition s official score. This result is be low the 65.73 per cent, the state-of-the-art performance for this task. Nevertheless, our solution significantly reduces the time to obtain the clusters of adocument, since, our system takes 0.35 seconds per document in the testing set, while in the state-of-the-art, it takes 5 seconds for each one.
|
Page generated in 0.068 seconds