Spelling suggestions: "subject:"[een] RELATION EXTRACTION"" "subject:"[enn] RELATION EXTRACTION""
21 |
Concept Based Knowledge Discovery From Biomedical LiteratureRadovanovic, Aleksandar January 2009 (has links)
Philosophiae Doctor - PhD / Advancement in biomedical research and continuous growth of scientific literature available in electronic form, calls for innovative methods and tools for information management, knowledge discovery, and data integration. Many biomedical fields such as genomics, proteomics, metabolomics, genetics, and emerging disciplines like systems biology and conceptual biology require synergy between experimental, computational, data mining and text mining technologies. A large amount of biomedical information available in various repositories, such as the US National Library of Medicine Bibliographic Database, emerge as a potential source of textual data for knowledge discovery. Text mining and its application of natural language processing and machine learning technologies to problems of knowledge discovery, is one of the most challenging fields in bioinformatics. This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are
compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology presented can be integrated with the researchers' own knowledge, experimentation and observations for optimal progression of scientific research.
|
22 |
Extracting Causal Relations between News Topics from Distributed SourcesMiranda Ackerman, Eduardo Jacobo 08 November 2013 (has links)
The overwhelming amount of online news presents a challenge called news information overload. To mitigate this challenge we propose a system to generate a causal network of news topics. To extract this information from distributed news sources, a system called Forest was developed. Forest retrieves documents that potentially contain causal information regarding a news topic. The documents are processed at a sentence level to extract causal relations and news topic references, these are the phases used to refer to a news topic. Forest uses a machine learning approach to classify causal sentences, and then renders the potential cause and effect of the sentences. The potential cause and effect are then classified as news topic references, these are the phrases used to refer to a news topics, such as “The World Cup” or “The Financial Meltdown”. Both classifiers use an algorithm developed within our working group, the algorithm performs better than several well known classification algorithms for the aforementioned tasks.
In our evaluations we found that participants consider causal information useful to understand the news, and that while we can not extract causal information for all news topics, it is highly likely that we can extract causal relation for the most popular news topics. To evaluate the accuracy of the extractions made by Forest, we completed a user survey. We found that by providing the top ranked results, we obtained a high accuracy in extracting causal relations between news topics.
|
23 |
Mobility Knowledge Graph and its Application in Public TransportZhang, Qi January 2023 (has links)
Efficient public transport planning, operations, and control rely on a deep understanding of human mobility in urban areas. The availability of extensive and diverse mobility data sources, such as smart card data and GPS data, provides opportunities to quantitatively study individual behavior and collective mobility patterns. However, analyzing and organizing these vast amounts of data is a challenging task. The Knowledge Graph (KG) is a graph-based method for knowledge representation and organization that has been successfully applied in various applications, yet the applications of KG in urban mobility are still limited. To further utilize the mobility data and explore human mobility patterns, the included papers constructed the Mobility Knowledge Graph (MKG), a general learning framework, and demonstrated its potential applications in public transport. Paper I introduces the concept of MKG and proposes a learning framework to construct MKG from smart card data in public transport networks. The framework captures the spatiotemporal travel pattern correlations between stations using both rule-based linear decomposition and neural network-based nonlinear decomposition methods. The paper validates the MKG construction framework and explores the value of MKG in predicting individual trip destinations using only tap-in records. Paper II proposes an application of user-station attention estimation to understand human mobility in urban areas, which facilitates downstream applications such as individual mobility prediction and location recommendation. To estimate the 'real' user-station attention from station visit counts data, the paper proposes a matrix decomposition method that captures both user similarity and station-station relations using the mobility knowledge graph (MKG). A neural network-based nonlinear decomposition approach was used to extract MKG relations capturing the latent spatiotemporal travel dependencies. The proposed framework is validated using synthetic and real-world data, demonstrating its significant value in contributing to user-station attention inference. / Effektiv planering, drift och kontroll av kollektivtrafik är beroende av end jup förståelse för mänsklig rörlighet i stadsområden. Tillgången till omfattande och varierande källor av rörlighetsdata, såsom data från smarta kort och GPS-data, ger möjligheter att kvantitativt studera individuellt beteende och kollektiva rörlighetsmönster. Att analysera och organisera dessa stora mängder data är dock en utmanande uppgift. Kunskapsgrafen (KG) är en grafba serad metod för kunskapsrepresentation och organisering som har tillämpats framgångsrikt inom olika områden, men användningen av KG inom urbana rörlighetsområden är fortfarande begränsad. För att ytterligare utnyttja rörlighetsdata och utforska mänskliga rörlighetsmönster har de inkluderade artiklarna konstruerat Mobility Knowledge Graph (MKG), en allmän inlärningsram, och visat dess potentiella tillämpningar inom kollektivtrafiken. Artikel I introducerar begreppet MKG och föreslår en inlärningsram för att konstruera MKG från data från smarta kort i kollektivtrafiknätverk. Ramverket fångar de rumsligt-temporala resmönstersambanden mellan stationer genom att använda både regelbaserade linjära dekomponeringsmetoder och neurala nätverksbaserade icke-linjära dekomponeringsmetoder. Artikeln validerar MKG-konstruktionsramverket och utforskar värdet av MKG för att förutsäga enskilda resmål med endast tap-in-register. Artikel II föreslår en tillämpning av uppskattning av användar-stations uppmärksamhet för att förstå mänsklig rörlighet i stadsområden, vilket underlättar efterföljande tillämpningar såsom individuell rörlighetsförutsägelse och platsrekommendationer. För att uppskatta den ’verkliga’ användar-stations uppmärksamheten från data om besöksantal på stationer föreslår artikeln en matrisdekomponeringsmetod som fångar både användarlikhet och station-stationsrelationer med hjälp av Mobility Knowledge Graph (MKG). En neural nätverksbaserad icke-linjär dekomponeringsmetod användes för att extrahera MKG-relationer som fångar de latenta rumsligt-temporala resberoendena. Det föreslagna ramverket valideras med hjälp av syntetiska och verkliga data och visar på dess betydande värde för att bidra till inferens av användar-stationsuppmärksamhet. / <p>QC231116</p>
|
24 |
Formalizing biomedical concepts from textual definitionsPetrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links) (PDF)
BACKGROUND:
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
RESULTS:
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
CONCLUSIONS:
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
25 |
[en] DISTANT SUPERVISION FOR RELATION EXTRACTION USING ONTOLOGY CLASS HIERARCHY-BASED FEATURES / [pt] SUPERVISÃO À DISTÂNCIA EM EXTRAÇÃO DE RELACIONAMENTOS USANDO CARACTERÍSTICAS BASEADAS EM HIERARQUIA DE CLASSES EM ONTOLOGIASPEDRO HENRIQUE RIBEIRO DE ASSIS 18 March 2015 (has links)
[pt] Extração de relacionamentos é uma etapa chave para o problema de
identificação de uma estrutura em um texto em formato de linguagem natural. Em
geral, estruturas são compostas por entidades e relacionamentos entre elas. As
propostas de solução com maior sucesso aplicam aprendizado de máquina
supervisionado a corpus anotados à mão para a criação de classificadores de alta
precisão. Embora alcancem boa robustez, corpus criados à mão não são escaláveis
por serem uma alternativa de grande custo. Neste trabalho, nós aplicamos um
paradigma alternativo para a criação de um número considerável de exemplos de
instâncias para classificação. Tal método é chamado de supervisão à distância. Em
conjunto com essa alternativa, usamos ontologias da Web semântica para propor e
usar novas características para treinar classificadores. Elas são baseadas na
estrutura e semântica descrita por ontologias onde recursos da Web semântica são
definidos. O uso de tais características tiveram grande impacto na precisão e recall
dos nossos classificadores finais. Neste trabalho, aplicamos nossa teoria em um
corpus extraído da Wikipedia. Alcançamos uma alta precisão e recall para um
número considerável de relacionamentos. / [en] Relation extraction is a key step for the problem of rendering a structure
from natural language text format. In general, structures are composed by entities
and relationships among them. The most successful approaches on relation
extraction apply supervised machine learning on hand-labeled corpus for creating
highly accurate classifiers. Although good robustness is achieved, hand-labeled
corpus are not scalable due to the expensive cost of its creation. In this work we
apply an alternative paradigm for creating a considerable number of examples of
instances for classification. Such method is called distant supervision. Along with
this alternative approach we adopt Semantic Web ontologies to propose and use
new features for training classifiers. Those features are based on the structure and
semantics described by ontologies where Semantic Web resources are defined.
The use of such features has a great impact on the precision and recall of our final
classifiers. In this work, we apply our theory on corpus extracted from Wikipedia.
We achieve a high precision and recall for a considerable number of relations.
|
26 |
Extraction de relations en domaine de spécialité / Relation extraction in specialized domainsMinard, Anne-Lyse 07 December 2012 (has links)
La quantité d'information disponible dans le domaine biomédical ne cesse d'augmenter. Pour que cette information soit facilement utilisable par les experts d'un domaine, il est nécessaire de l'extraire et de la structurer. Pour avoir des données structurées, il convient de détecter les relations existantes entre les entités dans les textes. Nos recherches se sont focalisées sur la question de l'extraction de relations complexes représentant des résultats expérimentaux, et sur la détection et la catégorisation de relations binaires entre des entités biomédicales. Nous nous sommes intéressée aux résultats expérimentaux présentés dans les articles scientifiques. Nous appelons résultat expérimental, un résultat quantitatif obtenu suite à une expérience et mis en relation avec les informations permettant de décrire cette expérience. Ces résultats sont importants pour les experts en biologie, par exemple pour faire de la modélisation. Dans le domaine de la physiologie rénale, une base de données a été créée pour centraliser ces résultats d'expérimentation, mais l'alimentation de la base est manuelle et de ce fait longue. Nous proposons une solution pour extraire automatiquement des articles scientifiques les connaissances pertinentes pour la base de données, c'est-à-dire des résultats expérimentaux que nous représentons par une relation n-aire. La méthode procède en deux étapes : extraction automatique des documents et proposition de celles-ci pour validation ou modification par l'expert via une interface. Nous avons également proposé une méthode à base d'apprentissage automatique pour l'extraction et la classification de relations binaires en domaine de spécialité. Nous nous sommes intéressée aux caractéristiques et variétés d'expressions des relations, et à la prise en compte de ces caractéristiques dans un système à base d'apprentissage. Nous avons étudié la prise en compte de la structure syntaxique de la phrase et la simplification de phrases dirigée pour la tâche d'extraction de relations. Nous avons en particulier développé une méthode de simplification à base d'apprentissage automatique, qui utilise en cascade plusieurs classifieurs. / The amount of available scientific literature is constantly growing. If the experts of a domain want to easily access this information, it must be extracted and structured. To obtain structured data, both entities and relations of the texts must be detected. Our research is about the problem of complex relation extraction which represent experimental results, and detection and classification of binary relations between biomedical entities. We are interested in experimental results presented in scientific papers. An experimental result is a quantitative result obtained by an experimentation and linked with information that describes this experimentation. These results are important for biology experts, for example for doing modelization. In the domain of renal physiology, a database was created to centralize these experimental results, but the base is manually populated, therefore the population takes a long time. We propose a solution to automatically extract relevant knowledge for the database from the scientific papers, that is experimental results which are represented by a n-ary relation. The method proceeds in two steps: automatic extraction from documents and proposal of information extracted for approval or modification by the experts via an interface. We also proposed a method based on machine learning for extraction and classification of binary relations in specialized domains. We focused on the variations of the expression of relations, and how to represent them in a machine learning system. We studied the way to take into account syntactic structure of the sentence and the sentence simplification guided by the task of relation extraction. In particular, we developed a simplification method based on machine learning, which uses a series of classifiers.
|
27 |
Construção automática de redes bayesianas para extração de interações proteína-proteína a partir de textos biomédicos / Learning Bayesian networks for extraction of protein-protein interaction from biomedical articlesJuárez, Pedro Nelson Shiguihara 20 June 2013 (has links)
A extração de Interações Proteína-Proteína (IPPs) a partir de texto é um problema relevante na área biomédica e um desafio na área de aprendizado de máquina. Na área biomédica, as IPPs são fundamentais para compreender o funcionamento dos seres vivos. No entanto, o número de artigos relacionados com IPPs está aumentando rapidamente, sendo impraticável identicá-las e catalogá-las manualmente. Por exemplo, no caso das IPPs humanas apenas 10% foram catalogadas. Por outro lado, em aprendizado de máquina, métodos baseados em kernels são frequentemente empregados para extrair automaticamente IPPs, atingindo resultados considerados estado da arte. Esses métodos usam informações léxicas, sintáticas ou semânticas como características. Entretanto, os resultados ainda são insuficientes, atingindo uma taxa relativamente baixa, em termos da medida F, devido à complexidade do problema. Apesar dos esforços em produzir kernels, cada vez mais sofisticados, usando árvores sintáticas como árvores constituintes ou de dependência, pouco é conhecido sobre o desempenho de outras abordagens de aprendizado de máquina como, por exemplo, as redes bayesianas. As àrvores constituintes são estruturas de grafos que contêm informação importante da gramática subjacente as sentenças de textos contendo IPPs. Por outro lado, a rede bayesiana permite modelar algumas regras da gramática e atribuir para elas uma distribuição de probabilidade de acordo com as sentenças de treinamento. Neste trabalho de mestrado propõe-se um método para construção automática de redes bayesianas a partir de árvores contituintes para extração de IPPs. O método foi testado em cinco corpora padrões da extração de IPPs, atingindo resultados competitivos, em alguns casos melhores, em comparação a métodos do estado da arte / Extracting Protein-Protein Interactions (PPIs) from text is a relevant problem in the biomedical field and a challenge in the area of machine learning. In the biomedical field, the PPIs are fundamental to understand the functioning of living organisms. However, the number of articles related to PPIs is increasing rapidly, hence it is impractical to identify and catalog them manually. For example, in the case of human PPIs only 10 % have been cataloged. On the other hand, machine learning methods based on kernels are often employed to automatically extract PPIs, achieving state of the art results. These methods use lexical, syntactic and semantic information as features. However, the results are still poor, reaching a relatively low rate of F-measure due to the complexity of the problem. Despite efforts to produce sophisticate kernels, using syntactic trees as constituent or dependency trees, little is known about the performance of other Machine Learning approaches, eg, Bayesian networks. Constituent tree structures are graphs which contain important information of the underlying grammar in sentences containing PPIs. On the other hand, the Bayesian network allows modeling some rules of grammar and assign to them a probability distribution according to the training sentences. In this master thesis we propose a method for automatic construction of Bayesian networks from constituent trees for extracting PPIs. The method was tested in five corpora, considered benchmark of extraction of PPI, achieving competitive results, and in some cases better results when compared to state of the art methods
|
28 |
Desenvolvimento de m?todo para consulta em linguagem natural de componentes de software / Development of method for natural language research of software componentsDomingues, Paulo Eduardo 28 June 2007 (has links)
Made available in DSpace on 2016-04-04T18:31:20Z (GMT). No. of bitstreams: 1
Paulo Eduardo Domingues.pdf: 2694773 bytes, checksum: 8954b221ccf920e889584da2390badf6 (MD5)
Previous issue date: 2007-06-28 / The development based on components allows to create inter-operable components, with well defined interfaces, reducing the complexity in the software development. In this scene, the library of software components plays an important role in corporate level, supporting documentation, specification, storage and recovery of components. Inside organizations, a components library supplies infrastructure for components lifecycle management. This work considers the storage and the recovery of components of software with the use of an interface in natural language. A method to generate a representation form is described, to be stored in the library, for the texts that describe the characteristics of the components that live in the library. The text of the research generated for the user also is represented of similar form to allow the comparison between the descriptions of the components of the library and the question of the user. Additionally the method is presented to determine the similarity between parts of the representations of the text of the characteristics with the text of the research, of form to return as resulted in sequence decreasing indication from priority the components that better take care of the research of the user. / O desenvolvimento baseado em componentes permite criar componentes inter-oper?veis, com interfaces bem definidas, reduzindo a complexidade no desenvolvimento de software. Neste cen?rio, a biblioteca de componentes de software exerce um papel importante em um ambiente corporativo, suportando a documenta??o, especifica??o, armazenamento e recupera??o de componentes. Dentro das organiza??es, uma biblioteca de componentes fornece uma infra-estrutura para o gerenciamento do ciclo de vida dos componentes. Este trabalho prop?e o armazenamento e a recupera??o de componentes de software com a utiliza??o de uma interface em linguagem natural. ? descrito um m?todo para gerar uma forma de representa??o, a ser armazenada na biblioteca, para os textos que descrevem as caracter?sticas dos componentes que integram a biblioteca. O texto da consulta gerada pelo usu?rio tamb?m ? representado de forma semelhante para permitir a compara??o entre as descri??es dos componentes da biblioteca e a quest?o do usu?rio. Adicionalmente, ? apresentado o m?todo para determinar a semelhan?a entre partes das representa??es do texto das caracter?sticas com o texto das consultas, de forma a retornar como resultado a indica??o em ordem decrescente de prioridade os componentes que melhor atendem a consulta do usu?rio.
|
29 |
Formalizing biomedical concepts from textual definitionsTsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader , Franz, Schroeder, Michael 04 January 2016 (has links) (PDF)
Background
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
Results
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
Conclusions
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
30 |
Apprentissage non supervisé de dépendances à partir de textes / Unsupervised dependency parsing from textsArcadias, Marie 02 October 2015 (has links)
Les grammaires de dépendance permettent de construire une organisation hiérarchique syntaxique des mots d’une phrase. La construction manuelle des arbres de dépendances étant une tâche exigeant temps et expertise, de nombreux travaux cherchent à l’automatiser. Visant à établir un processus léger et facilement adaptable nous nous sommes intéressés à l’apprentissage non supervisé de dépendances, évitant ainsi d’avoir recours à une expertise coûteuse. L’état de l’art en apprentissage non supervisé de dépendances (DMV) se compose de méthodes très complexes et extrêmement sensibles au paramétrage initial. Nous présentons dans cette thèse un nouveau modèle pour résoudre ce problème d’analyse de dépendances, mais de façon plus simple, plus rapide et plus adaptable. Nous apprenons une famille de grammaires (PCFG) réduites à moins de 6 non terminaux et de 15 règles de combinaisons des non terminaux à partir des étiquettes grammaticales. Les PCFG de cette famille que nous nommons DGdg (pour DROITE GAUCHE droite gauche) se paramètrent très légèrement, ainsi elles s’adaptent sans effort aux 12 langues testées. L’apprentissage et l’analyse sont effectués au moins deux fois plus rapidement que DMV sur les mêmes données. Et la qualité des analyses DGdg est pour certaines langues proches des analyses par DMV. Nous proposons une première application de notre méthode d’analyse de dépendances à l’extraction d’informations. Nous apprenons par des CRF un étiquetage en fonctions « sujet », « objet » et « prédicat », en nous fondant sur des caractéristiques extraites des arbres construits. / Dependency grammars allow the construction of a hierarchical organization of the words of sentences. The one-by-one building of dependency trees can be very long and it requries expert knowledge. In this regard, we are interested in unsupervised dependency learning. Currently, DMV give the state-of-art results in unsupervised dependency parsing. However, DMV has been known to be highly sensitive to initial parameters. The training of DMV model is also heavy and long. We present in this thesis a new model to solve this problem in a simpler, faster and more adaptable way. We learn a family of PCFG using less than 6 nonterminal symbols and less than 15 combination rules from the part-of-speech tags. The tuning of these PCFG is ligth, and so easily adaptable to the 12 languages we tested. Our proposed method for unsupervised dependency parsing can show the near state-of-the-art results, being twice faster. Moreover, we describe our interests in dependency trees to other applications such as relation extraction. Therefore, we show how such information from dependency structures can be integrated into condition random fields and how to improve a relation extraction task.
|
Page generated in 0.0428 seconds