Spelling suggestions: "subject:"coreference desolution"" "subject:"coreference cesolution""
1 |
Hybrid Methods for Coreference Resolution in SwedishNilsson, Kristina January 2010 (has links)
The aim of this thesis is to improve coreference resolution in Swedish by providing a hybrid approach based on combining data-driven methods and linguistic knowledge. Coreference resolution here consists in identifying all expressions in a text that have the same referent, for example, a person or an object. The linguistic knowledge is based on Accessibility Theory (Ariel 1990). This is used for guiding the selection of likely anaphor-antecedent pairs from the set of all possible such pairs in a text. The data-driven method adopted is Memory-Based Learning (MBL), a supervised method based on the idea that learning means storing experiences in memory, and that new problems are solved by reusing solutions from similar experiences (Daelemans and Van den Bosch 2005). The referring expressions covered by the system are names, definite descriptions, and pronouns. In order to maximize performance, we use different classifiers with a specific set of linguistically motivated features for each type of expression. The great majority of features used for classification are domain- and language-independent. We demonstrate two ways of using this method of linguistically motivated selection of anaphor-antecedent pairs. First, the amount of training examples stored in memory is reduced. We find that for coreference resolution of definite descriptions and names, the amount of training data can thereby be reduced with only a minor loss in performance, but for pronoun resolution there is a negative effect. Second, selection can be used for improving on coreference resolution results. This is the first step in our hybrid approach to coreference resolution, where the second step is the application of an MBL classifier for determining coreference between the selected pairs. Results indicate that this hybrid approach is advantageous for coreference resolution of definite descriptions and names. For pronoun resolution, there is a negative effect on recall along with a positive effect on precision. / För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se
|
2 |
Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesaFonseca, Evandro Brasil 19 March 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-06-19T11:37:24Z
No. of bitstreams: 1
EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-26T14:40:39Z (GMT) No. of bitstreams: 1
EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5) / Made available in DSpace on 2018-06-26T14:48:46Z (GMT). No. of bitstreams: 1
EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)
Previous issue date: 2018-03-19 / Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence ? France is refusing. The country is one of the first in the ranking... ? we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis. / A tarefa de Resolu??o de Correfer?ncia ? um grande desafio para a ?rea de Processamento da Linguagem Natural, tendo em vista o conhecimento lingu?stico exigido e a sofistica??o das t?cnicas de processamento da l?ngua empregados. Mesmo sendo uma tarefa desafiadora, um fator motivador do estudo deste fen?meno se d? pela sua utilidade. Basicamente, v?rias tarefas de Processamento da Linguagem Natural podem se beneficiar de seus resultados, como, por exemplo, o reconhecimento de entidades nomeadas, extra??o de rela??o entre entidades nomeadas, sumariza??o, an?lise de sentimentos, entre outras. A Resolu??o de Correfer?ncia ? um processo que consiste em identificar determinados termos e express?es que remetem a uma mesma entidade. Por exemplo, na senten?a ?A Fran?a est? resistindo. O pa?s ? um dos primeiros no ranking...? podemos dizer que [o pa?s] ? uma correfer?ncia de [A Fran?a]. Realizando o agrupamento desses termos referenciais, formamos grupos de men??es correferentes, mais conhecidos como cadeias de correfer?ncia. Esta tese prop?e um processo para a resolu??o de correfer?ncia entre sintagmas nominais para a l?ngua portuguesa, tendo como foco a utiliza??o do conhecimento sem?ntico. Nossa abordagem proposta ? baseada em regras lingu?sticas sint?tico-sem?nticas. Ou seja, combinamos diferentes n?veis de processamento lingu?stico utilizando rela??es sem?nticas como apoio, de forma a inferir rela??es referenciais entre men??es. Modelos baseados em regras lingu?sticas t?m sido aplicados eficientemente em outros idiomas como o ingl?s, o espanhol e o galego. Esses modelos mostram-se mais eficientes que os baseados em aprendizado de m?quina quando lidamos com idiomas menos providos de recursos, dado que a aus?ncia de corpora ricos em amostras pode prejudicar o treino desses modelos. O modelo proposto nesta tese ? o primeiro voltado para a resolu??o de correfer?ncia em portugu?s que faz uso de conhecimento sem?ntico. Dessa forma, tomamos este fator como a principal contribui??o deste trabalho.
|
3 |
'Healthy' Coreference: Applying Coreference Resolution to the Health Education DomainHirtle, David Z. January 2008 (has links)
This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer.
<br/><br/>
There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount.
<br/><br/>
No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data.
<br/><br/>
Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future.
|
4 |
'Healthy' Coreference: Applying Coreference Resolution to the Health Education DomainHirtle, David Z. January 2008 (has links)
This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer.
<br/><br/>
There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount.
<br/><br/>
No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data.
<br/><br/>
Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future.
|
5 |
Information extraction from pharmaceutical literatureBatista-Navarro, Riza Theresa Bautista January 2014 (has links)
With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods.
|
6 |
MEDICAL EVENT TIMELINE GENERATION FROM CLINICAL NARRATIVESRaghavan, Preethi 05 September 2014 (has links)
No description available.
|
7 |
Koreference z mezijazykové perspektivy / Coreference from the Cross-lingual PerspectiveNovák, Michal January 2018 (has links)
Coreference from the Cross-lingual Perspective Michal Nov'ak The subject of this thesis is to study properties of coreference using cross- lingual approaches. The work is motivated by the research on coreference-related linguistic typology. Another motivation is to explore whether differences in the ways how languages express coreference can be exploited to build better models for coreference resolution. We design two cross-lingual methods: the bilingually informed coreference resolution and the coreference projection. The results of our experiments with the methods carried out on Czech-English data suggest that with respect to coreference English is more informative for Czech than vice versa. Furthermore, the bilingually informed resolution applied on parallel texts has managed to outperform the monolingual resolver on both languages. In the experiments, we employ the monolingual coreference resolver and an improved method for alignment of coreferential expressions, both of which we also designed within the thesis. 1
|
8 |
Prerequisites for Extracting Entity Relations from Swedish TextsLenas, Erik January 2020 (has links)
Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework. / Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline.
|
9 |
Coreference resolution with and for WikipediaGhaddar, Abbas 06 1900 (has links)
Wikipédia est une ressource embarquée dans de nombreuses applications du traite-
ment des langues naturelles. Pourtant, aucune étude à notre connaissance n’a tenté de
mesurer la qualité de résolution de coréférence dans les textes de Wikipédia, une étape
préliminaire à la compréhension de textes. La première partie de ce mémoire consiste à
construire un corpus de coréférence en anglais, construit uniquement à partir des articles
de Wikipédia. Les mentions sont étiquetées par des informations syntaxiques et séman-
tiques, avec lorsque cela est possible un lien vers les entités FreeBase équivalentes. Le
but est de créer un corpus équilibré regroupant des articles de divers sujets et tailles.
Notre schéma d’annotation est similaire à celui suivi dans le projet OntoNotes. Dans la
deuxième partie, nous allons mesurer la qualité des systèmes de détection de coréférence
à l’état de l’art sur une tâche simple consistant à mesurer les mentions du concept décrit
dans une page Wikipédia (p. ex : les mentions du président Obama dans la page Wiki-
pédia dédiée à cette personne). Nous tenterons d’améliorer ces performances en faisant
usage le plus possible des informations disponibles dans Wikipédia (catégories, redi-
rects, infoboxes, etc.) et Freebase (information du genre, du nombre, type de relations
avec autres entités, etc.). / Wikipedia is a resource of choice exploited in many NLP applications, yet we are
not aware of recent attempts to adapt coreference resolution to this resource, a prelim-
inary step to understand Wikipedia texts. The first part of this master thesis is to build
an English coreference corpus, where all documents are from the English version of
Wikipedia. We annotated each markable with coreference type, mention type and the
equivalent Freebase topic. Our corpus has no restriction on the topics of the documents
being annotated, and documents of various sizes have been considered for annotation.
Our annotation scheme follows the one of OntoNotes with a few disparities. In part two,
we propose a testbed for evaluating coreference systems in a simple task of measuring
the particulars of the concept described in a Wikipedia page (eg. The statements of Pres-
ident Obama the Wikipedia page dedicated to that person). We show that by exploiting
the Wikipedia markup (categories, redirects, infoboxes, etc.) of a document, as well
as links to external knowledge bases such as Freebase (information of the type, num-
ber, type of relationship with other entities, etc.), we can acquire useful information on
entities that helps to classify mentions as coreferent or not.
|
10 |
A constraint-based hypergraph partitioning approach to coreference resolutionSapena Masip, Emili 16 May 2012 (has links)
The objectives of this thesis are focused on research in machine learning for
coreference resolution. Coreference resolution is a natural language processing
task that consists of determining the expressions in a discourse that mention or
refer to the same entity.
The main contributions of this thesis are (i) a new approach to coreference
resolution based on constraint satisfaction, using a hypergraph to represent
the problem and solving it by relaxation labeling; and (ii) research towards
improving coreference resolution performance using world knowledge extracted
from Wikipedia.
The developed approach is able to use entity-mention classi cation model
with more expressiveness than the pair-based ones, and overcome the weaknesses
of previous approaches in the state of the art such as linking contradictions,
classi cations without context and lack of information evaluating pairs. Furthermore,
the approach allows the incorporation of new information by adding
constraints, and a research has been done in order to use world knowledge to
improve performances.
RelaxCor, the implementation of the approach, achieved results in the
state of the art, and participated in international competitions: SemEval-2010
and CoNLL-2011. RelaxCor achieved second position in CoNLL-2011. / La resolució de correferències és una tasca de processament del llenguatge natural que consisteix en determinar les expressions
d'un discurs que es refereixen a la mateixa entitat del mon real. La tasca té un efecte directe en la minería de textos així com en
moltes tasques de llenguatge natural que requereixin interpretació del discurs com resumidors, responedors de preguntes o
traducció automàtica. Resoldre les correferències és essencial si es vol poder “entendre” un text o un discurs.
Els objectius d'aquesta tesi es centren en la recerca en resolució de correferències amb aprenentatge automàtic. Concretament,
els objectius de la recerca es centren en els següents camps:
+ Models de classificació: Els models de classificació més comuns a l'estat de l'art estan basats en la classificació independent de
parelles de mencions. Més recentment han aparegut models que classifiquen grups de mencions. Un dels objectius de la tesi és
incorporar el model entity-mention a l'aproximació desenvolupada.
+ Representació del problema: Encara no hi ha una representació definitiva del problema. En aquesta tesi es presenta una
representació en hypergraf.
+ Algorismes de resolució. Depenent de la representació del problema i del model de classificació, els algorismes de ressolució
poden ser molt diversos. Un dels objectius d'aquesta tesi és trobar un algorisme de resolució capaç d'utilitzar els models de
classificació en la representació d'hypergraf.
+ Representació del coneixement: Per poder administrar coneixement de diverses fonts, cal una representació simbòlica i
expressiva d'aquest coneixement. En aquesta tesi es proposa l'ús de restriccions.
+ Incorporació de coneixement del mon: Algunes correferències no es poden resoldre només amb informació lingüística. Sovint
cal sentit comú i coneixement del mon per poder resoldre coreferències. En aquesta tesi es proposa un mètode per extreure
coneixement del mon de Wikipedia i incorporar-lo al sistem de resolució.
Les contribucions principals d'aquesta tesi son (i) una nova aproximació al problema de resolució de correferències basada en
satisfacció de restriccions, fent servir un hypergraf per representar el problema, i resolent-ho amb l'algorisme relaxation labeling; i
(ii) una recerca per millorar els resultats afegint informació del mon extreta de la Wikipedia.
L'aproximació presentada pot fer servir els models mention-pair i entity-mention de forma combinada evitant així els problemes
que es troben moltes altres aproximacions de l'estat de l'art com per exemple: contradiccions de classificacions independents,
falta de context i falta d'informació. A més a més, l'aproximació presentada permet incorporar informació afegint restriccions i s'ha
fet recerca per aconseguir afegir informació del mon que millori els resultats.
RelaxCor, el sistema que ha estat implementat durant la tesi per experimentar amb l'aproximació proposada, ha aconseguit uns
resultats comparables als millors que hi ha a l'estat de l'art. S'ha participat a les competicions internacionals SemEval-2010 i
CoNLL-2011. RelaxCor va obtenir la segona posició al CoNLL-2010.
|
Page generated in 0.0805 seconds