Spelling suggestions: "subject:"Relation extraction"" "subject:"Relation axtraction""
51 |
Relation Classification using Semantically-Enhanced Syntactic Dependency Paths : Combining Semantic and Syntactic Dependencies for Relation Classification using Long Short-Term Memory NetworksCapshaw, Riley January 2018 (has links)
Many approaches to solving tasks in the field of Natural Language Processing (NLP) use syntactic dependency trees (SDTs) as a feature to represent the latent nonlinear structure within sentences. Recently, work in parsing sentences to graph-based structures which encode semantic relationships between words—called semantic dependency graphs (SDGs)—has gained interest. This thesis seeks to explore the use of SDGs in place of and alongside SDTs within a relation classification system based on long short-term memory (LSTM) neural networks. Two methods for handling the information in these graphs are presented and compared between two SDG formalisms. Three new relation extraction system architectures have been created based on these methods and are compared to a recent state-of-the-art LSTM-based system, showing comparable results when semantic dependencies are used to enhance syntactic dependencies, but with significantly fewer training parameters.
|
52 |
Ontoilper: an ontology- and inductive logic programming-based method to extract instances of entities and relations from textsLima, Rinaldo José de, Freitas, Frederico Luiz Gonçalves de 31 January 2014 (has links)
Submitted by Nayara Passos (nayara.passos@ufpe.br) on 2015-03-13T12:33:46Z
No. of bitstreams: 2
TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T13:16:54Z (GMT) No. of bitstreams: 2
TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-13T13:16:54Z (GMT). No. of bitstreams: 2
TESE Rinaldo José de Lima.pdf: 8678943 bytes, checksum: e88c290e414329ee00d2d6a35a466de0 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Previous issue date: 2014 / CNPq, CAPES. / Information Extraction (IE) consists in the task of discovering and structuring information found
in a semi-structured or unstructured textual corpus. Named Entity Recognition (NER) and Relation
Extraction (RE) are two important subtasks in IE. The former aims at finding named entities,
including the name of people, locations, among others, whereas the latter consists in detecting
and characterizing relations involving such named entities in text. Since the approach of manually
creating extraction rules for performing NER and RE is an intensive and time-consuming task,
researchers have turned their attention to how machine learning techniques can be applied to
IE in order to make IE systems more adaptive to domain changes. As a result, a myriad of
state-of-the-art methods for NER and RE relying on statistical machine learning techniques
have been proposed in the literature. Such systems typically use a propositional hypothesis
space for representing examples, i.e., an attribute-value representation. In machine learning, the
propositional representation of examples presents some limitations, particularly in the extraction
of binary relations, which mainly demands not only contextual and relational information about
the involving instances, but also more expressive semantic resources as background knowledge.
This thesis attempts to mitigate the aforementioned limitations based on the hypothesis that, to
be efficient and more adaptable to domain changes, an IE system should exploit ontologies and
semantic resources in a framework for IE that enables the automatic induction of extraction rules
by employing machine learning techniques. In this context, this thesis proposes a supervised
method to extract both entity and relation instances from textual corpora based on Inductive
Logic Programming, a symbolic machine learning technique. The proposed method, called
OntoILPER, benefits not only from ontologies and semantic resources, but also relies on a highly
expressive relational hypothesis space, in the form of logical predicates, for representing examples
whose structure is relevant to the information extraction task. OntoILPER automatically
induces symbolic extraction rules that subsume examples of entity and relation instances from
a tailored graph-based model of sentence representation, another contribution of this thesis.
Moreover, this graph-based model for representing sentences also enables the exploitation of
domain ontologies and additional background knowledge in the form of a condensed set of
features including lexical, syntactic, semantic, and relational ones. Differently from most of
the IE methods (a comprehensive survey is presented in this thesis, including the ones that also
apply ILP), OntoILPER takes advantage of a rich text preprocessing stage which encompasses
various shallow and deep natural language processing subtasks, including dependency parsing,
coreference resolution, word sense disambiguation, and semantic role labeling. Further mappings
of nouns and verbs to (formal) semantic resources are also considered. OntoILPER Framework,
the OntoILPER implementation, was experimentally evaluated on both NER and RE tasks.
This thesis reports the results of several assessments conducted using six standard evaluationcorpora from two distinct domains: news and biomedical. The obtained results demonstrated
the effectiveness of OntoILPER on both NER and RE tasks. Actually, the proposed framework
outperforms some of the state-of-the-art IE systems compared in this thesis. / A área de Extração de Informação (IE) visa descobrir e estruturar informações dispostas em
documentos semi-estruturados ou desestruturados. O Reconhecimento de Entidades Nomeadas
(REN) e a Extração de Relações (ER) são duas subtarefas importantes em EI. A primeira visa
encontrar entidades nomeadas, incluindo nome de pessoas e lugares, entre outros; enquanto
que a segunda, consiste na detecção e caracterização de relações que envolvem as entidades
nomeadas presentes no texto. Como a tarefa de criar manualmente as regras de extração para
realizar REN e ER é muito trabalhosa e onerosa, pesquisadores têm voltado suas atenções na
investigação de como as técnicas de aprendizado de máquina podem ser aplicadas à EI a fim de
tornar os sistemas de ER mais adaptáveis às mudanças de domínios. Como resultado, muitos
métodos do estado-da-arte em REN e ER, baseados em técnicas estatísticas de aprendizado de
máquina, têm sido propostos na literatura. Tais sistemas normalmente empregam um espaço
de hipóteses com expressividade propositional para representar os exemplos, ou seja, eles são
baseado na tradicional representação atributo-valor. Em aprendizado de máquina, a representação
proposicional apresenta algums fatores limitantes, principalmente na extração de relações binárias
que exigem não somente informações contextuais e estruturais (relacionais) sobre as instâncias,
mas também outras formas de como adicionar conhecimento prévio do problema durante o
processo de aprendizado. Esta tese visa atenuar as limitações acima mencionadas, tendo como
hipótese de trabalho que, para ser eficiente e mais facilmente adaptável às mudanças de domínio,
os sistemas de EI devem explorar ontologias e recursos semânticos no contexto de um arcabouço
para EI que permita a indução automática de regras de extração de informação através do
emprego de técnicas de aprendizado de máquina. Neste contexto, a presente tese propõe um
método supervisionado capaz de extrair instâncias de entidades (ou classes de ontologias) e de
relações a partir de textos apoiando-se na Programação em Lógica Indutiva (PLI), uma técnica de
aprendizado de máquina supervisionada capaz de induzir regras simbólicas de classificação. O
método proposto, chamado OntoILPER, não só se beneficia de ontologias e recursos semânticos,
mas também se baseia em um expressivo espaço de hipóteses, sob a forma de predicados
lógicos, capaz de representar exemplos cuja estrutura é relevante para a tarefa de EI consideradas
nesta tese. OntoILPER automaticamente induz regras simbólicas para classificar exemplos de
instâncias de entidades e relações a partir de um modelo de representação de frases baseado
em grafos. Tal modelo de representação é uma das constribuições desta tese. Além disso, o
modelo baseado em grafos para representação de frases e exemplos (instâncias de classes e
relações) favorece a integração de conhecimento prévio do problema na forma de um conjunto
reduzido de atributos léxicos, sintáticos, semânticos e estruturais. Diferentemente da maioria dos
métodos de EI (uma pesquisa abrangente é apresentada nesta tese, incluindo aqueles que também
se aplicam a PLI), OntoILPER faz uso de várias subtarefas do Processamento de Linguagem
|
53 |
Prerequisites for Extracting Entity Relations from Swedish TextsLenas, Erik January 2020 (has links)
Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework. / Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline.
|
Page generated in 0.1277 seconds