Global ETD Search

21	Predicative Analysis for Information Extraction : application to the biology domain / Analyse prédicative pour l'extraction d'information : application au domaine de la biologie Ratkovic, Zorana 11 December 2014 (has links) L’abondance de textes dans le domaine biomédical nécessite le recours à des méthodes de traitement automatique pour améliorer la recherche d’informations précises. L’extraction d’information (EI) vise précisément à extraire de l’information pertinente à partir de données non-structurées. Une grande partie des méthodes dans ce domaine se concentre sur les approches d’apprentissage automatique, en ayant recours à des traitements linguistiques profonds. L’analyse syntaxique joue notamment un rôle important, en fournissant une analyse précise des relations entre les éléments de la phrase.Cette thèse étudie le rôle de l’analyse syntaxique en dépendances dans le cadre d’applications d’EI dans le domaine biomédical. Elle comprend l’évaluation de différents analyseurs ainsi qu’une analyse détaillée des erreurs. Une fois l’analyseur le plus adapté sélectionné, les différentes étapes de traitement linguistique pour atteindre une EI de haute qualité, fondée sur la syntaxe, sont abordés : ces traitements incluent des étapes de pré-traitement (segmentation en mots) et des traitements linguistiques de plus haut niveau (lié à la sémantique et à l’analyse de la coréférence). Cette thèse explore également la manière dont les différents niveaux de traitement linguistique peuvent être représentés puis exploités par l’algorithme d’apprentissage. Enfin, partant du constat que le domaine biomédical est en fait extrêmement diversifié, cette thèse explore l’adaptation des techniques à différents sous-domaines, en utilisant des connaissances et des ressources déjà existantes. Les méthodes et les approches décrites sont explorées en utilisant deux corpus biomédicaux différents, montrant comment les résultats d’IE sont utilisés dans des tâches concrètes. / The abundance of biomedical information expressed in natural language has resulted in the need for methods to process this information automatically. In the field of Natural Language Processing (NLP), Information Extraction (IE) focuses on the extraction of relevant information from unstructured data in natural language. A great deal of IE methods today focus on Machine Learning (ML) approaches that rely on deep linguistic processing in order to capture the complex information contained in biomedical texts. In particular, syntactic analysis and parsing have played an important role in IE, by helping capture how words in a sentence are related. This thesis examines how dependency parsing can be used to facilitate IE. It focuses on a task-based approach to dependency parsing evaluation and parser selection, including a detailed error analysis. In order to achieve a high quality of syntax-based IE, different stages of linguistic processing are addressed, including both pre-processing steps (such as tokenization) and the use of complementary linguistic processing (such as the use of semantics and coreference analysis). This thesis also explores how the different levels of linguistics processing can be represented for use within an ML-based IE algorithm, and how the interface between these two is of great importance. Finally, biomedical data is very heterogeneous, encompassing different subdomains and genres. This thesis explores how subdomain-adaptationcan be achieved by using already existing subdomain knowledge and resources. The methods and approaches described are explored using two different biomedical corpora, demonstrating how the IE results are used in real-life tasks. Extraction d’information Extraction de relation Analyse syntaxique en dépendances TAL BioNLP Information Extraction Relation Extraction Dependency parsing NLP BioNLP
22	Concept Based Knowledge Discovery from Biomedical Literature Radovanovic, Aleksandar. January 2009 (has links) Philosophiae Doctor - PhD / This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology, resented can be integrated with the researchers own knowledge, experimentation and observations for optimal progression of scientific research. / South Africa Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning
23	Concept Based Knowledge Discovery From Biomedical Literature Radovanovic, Aleksandar January 2009 (has links) Philosophiae Doctor - PhD / Advancement in biomedical research and continuous growth of scientific literature available in electronic form, calls for innovative methods and tools for information management, knowledge discovery, and data integration. Many biomedical fields such as genomics, proteomics, metabolomics, genetics, and emerging disciplines like systems biology and conceptual biology require synergy between experimental, computational, data mining and text mining technologies. A large amount of biomedical information available in various repositories, such as the US National Library of Medicine Bibliographic Database, emerge as a potential source of textual data for knowledge discovery. Text mining and its application of natural language processing and machine learning technologies to problems of knowledge discovery, is one of the most challenging fields in bioinformatics. This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology presented can be integrated with the researchers' own knowledge, experimentation and observations for optimal progression of scientific research. Bioinformaties Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning
24	Extracting Causal Relations between News Topics from Distributed Sources Miranda Ackerman, Eduardo Jacobo 08 November 2013 (has links) The overwhelming amount of online news presents a challenge called news information overload. To mitigate this challenge we propose a system to generate a causal network of news topics. To extract this information from distributed news sources, a system called Forest was developed. Forest retrieves documents that potentially contain causal information regarding a news topic. The documents are processed at a sentence level to extract causal relations and news topic references, these are the phases used to refer to a news topic. Forest uses a machine learning approach to classify causal sentences, and then renders the potential cause and effect of the sentences. The potential cause and effect are then classified as news topic references, these are the phrases used to refer to a news topics, such as “The World Cup” or “The Financial Meltdown”. Both classifiers use an algorithm developed within our working group, the algorithm performs better than several well known classification algorithms for the aforementioned tasks. In our evaluations we found that participants consider causal information useful to understand the news, and that while we can not extract causal information for all news topics, it is highly likely that we can extract causal relation for the most popular news topics. To evaluate the accuracy of the extractions made by Forest, we completed a user survey. We found that by providing the top ranked results, we obtained a high accuracy in extracting causal relations between news topics. info:eu-repo/classification/ddc/004 ddc:004
25	Mobility Knowledge Graph and its Application in Public Transport Zhang, Qi January 2023 (has links) Efficient public transport planning, operations, and control rely on a deep understanding of human mobility in urban areas. The availability of extensive and diverse mobility data sources, such as smart card data and GPS data, provides opportunities to quantitatively study individual behavior and collective mobility patterns. However, analyzing and organizing these vast amounts of data is a challenging task. The Knowledge Graph (KG) is a graph-based method for knowledge representation and organization that has been successfully applied in various applications, yet the applications of KG in urban mobility are still limited. To further utilize the mobility data and explore human mobility patterns, the included papers constructed the Mobility Knowledge Graph (MKG), a general learning framework, and demonstrated its potential applications in public transport. Paper I introduces the concept of MKG and proposes a learning framework to construct MKG from smart card data in public transport networks. The framework captures the spatiotemporal travel pattern correlations between stations using both rule-based linear decomposition and neural network-based nonlinear decomposition methods. The paper validates the MKG construction framework and explores the value of MKG in predicting individual trip destinations using only tap-in records. Paper II proposes an application of user-station attention estimation to understand human mobility in urban areas, which facilitates downstream applications such as individual mobility prediction and location recommendation. To estimate the 'real' user-station attention from station visit counts data, the paper proposes a matrix decomposition method that captures both user similarity and station-station relations using the mobility knowledge graph (MKG). A neural network-based nonlinear decomposition approach was used to extract MKG relations capturing the latent spatiotemporal travel dependencies. The proposed framework is validated using synthetic and real-world data, demonstrating its significant value in contributing to user-station attention inference. / Effektiv planering, drift och kontroll av kollektivtrafik är beroende av end jup förståelse för mänsklig rörlighet i stadsområden. Tillgången till omfattande och varierande källor av rörlighetsdata, såsom data från smarta kort och GPS-data, ger möjligheter att kvantitativt studera individuellt beteende och kollektiva rörlighetsmönster. Att analysera och organisera dessa stora mängder data är dock en utmanande uppgift. Kunskapsgrafen (KG) är en grafba serad metod för kunskapsrepresentation och organisering som har tillämpats framgångsrikt inom olika områden, men användningen av KG inom urbana rörlighetsområden är fortfarande begränsad. För att ytterligare utnyttja rörlighetsdata och utforska mänskliga rörlighetsmönster har de inkluderade artiklarna konstruerat Mobility Knowledge Graph (MKG), en allmän inlärningsram, och visat dess potentiella tillämpningar inom kollektivtrafiken. Artikel I introducerar begreppet MKG och föreslår en inlärningsram för att konstruera MKG från data från smarta kort i kollektivtrafiknätverk. Ramverket fångar de rumsligt-temporala resmönstersambanden mellan stationer genom att använda både regelbaserade linjära dekomponeringsmetoder och neurala nätverksbaserade icke-linjära dekomponeringsmetoder. Artikeln validerar MKG-konstruktionsramverket och utforskar värdet av MKG för att förutsäga enskilda resmål med endast tap-in-register. Artikel II föreslår en tillämpning av uppskattning av användar-stations uppmärksamhet för att förstå mänsklig rörlighet i stadsområden, vilket underlättar efterföljande tillämpningar såsom individuell rörlighetsförutsägelse och platsrekommendationer. För att uppskatta den ’verkliga’ användar-stations uppmärksamheten från data om besöksantal på stationer föreslår artikeln en matrisdekomponeringsmetod som fångar både användarlikhet och station-stationsrelationer med hjälp av Mobility Knowledge Graph (MKG). En neural nätverksbaserad icke-linjär dekomponeringsmetod användes för att extrahera MKG-relationer som fångar de latenta rumsligt-temporala resberoendena. Det föreslagna ramverket valideras med hjälp av syntetiska och verkliga data och visar på dess betydande värde för att bidra till inferens av användar-stationsuppmärksamhet. / <p>QC231116</p> Knowledge graph Smart card data Public transport User-station attention Relation extraction Transport Systems and Logistics Transportteknik och logistik
26	Formalizing biomedical concepts from textual definitions Petrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links) (PDF) BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. Biomedizinische Ontologien formale Definitionen TU Dresden Publikationsfonds Biomedical ontologies Formal definitions MeSH Relation extraction SNOMED CT Technical University Dresden Publication funds ddc:610 rvk:XA 10000
27	[en] DISTANT SUPERVISION FOR RELATION EXTRACTION USING ONTOLOGY CLASS HIERARCHY-BASED FEATURES / [pt] SUPERVISÃO À DISTÂNCIA EM EXTRAÇÃO DE RELACIONAMENTOS USANDO CARACTERÍSTICAS BASEADAS EM HIERARQUIA DE CLASSES EM ONTOLOGIAS PEDRO HENRIQUE RIBEIRO DE ASSIS 18 March 2015 (has links) [pt] Extração de relacionamentos é uma etapa chave para o problema de identificação de uma estrutura em um texto em formato de linguagem natural. Em geral, estruturas são compostas por entidades e relacionamentos entre elas. As propostas de solução com maior sucesso aplicam aprendizado de máquina supervisionado a corpus anotados à mão para a criação de classificadores de alta precisão. Embora alcancem boa robustez, corpus criados à mão não são escaláveis por serem uma alternativa de grande custo. Neste trabalho, nós aplicamos um paradigma alternativo para a criação de um número considerável de exemplos de instâncias para classificação. Tal método é chamado de supervisão à distância. Em conjunto com essa alternativa, usamos ontologias da Web semântica para propor e usar novas características para treinar classificadores. Elas são baseadas na estrutura e semântica descrita por ontologias onde recursos da Web semântica são definidos. O uso de tais características tiveram grande impacto na precisão e recall dos nossos classificadores finais. Neste trabalho, aplicamos nossa teoria em um corpus extraído da Wikipedia. Alcançamos uma alta precisão e recall para um número considerável de relacionamentos. / [en] Relation extraction is a key step for the problem of rendering a structure from natural language text format. In general, structures are composed by entities and relationships among them. The most successful approaches on relation extraction apply supervised machine learning on hand-labeled corpus for creating highly accurate classifiers. Although good robustness is achieved, hand-labeled corpus are not scalable due to the expensive cost of its creation. In this work we apply an alternative paradigm for creating a considerable number of examples of instances for classification. Such method is called distant supervision. Along with this alternative approach we adopt Semantic Web ontologies to propose and use new features for training classifiers. Those features are based on the structure and semantics described by ontologies where Semantic Web resources are defined. The use of such features has a great impact on the precision and recall of our final classifiers. In this work, we apply our theory on corpus extracted from Wikipedia. We achieve a high precision and recall for a considerable number of relations. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] WEB SEMANTICA [en] SEMANTIC WEB [pt] EXTRACAO DE RELACIONAMENTOS [en] RELATION EXTRACTION [pt] SUPERVISAO A DISTANCIA [en] DISTANT SUPERVISION [pt] PROCESSAMENTO NATURAL DE LINGUAGENS [en] NATURAL LANGUAGE PROCESSING
28	Extraction de relations en domaine de spécialité / Relation extraction in specialized domains Minard, Anne-Lyse 07 December 2012 (has links) La quantité d'information disponible dans le domaine biomédical ne cesse d'augmenter. Pour que cette information soit facilement utilisable par les experts d'un domaine, il est nécessaire de l'extraire et de la structurer. Pour avoir des données structurées, il convient de détecter les relations existantes entre les entités dans les textes. Nos recherches se sont focalisées sur la question de l'extraction de relations complexes représentant des résultats expérimentaux, et sur la détection et la catégorisation de relations binaires entre des entités biomédicales. Nous nous sommes intéressée aux résultats expérimentaux présentés dans les articles scientifiques. Nous appelons résultat expérimental, un résultat quantitatif obtenu suite à une expérience et mis en relation avec les informations permettant de décrire cette expérience. Ces résultats sont importants pour les experts en biologie, par exemple pour faire de la modélisation. Dans le domaine de la physiologie rénale, une base de données a été créée pour centraliser ces résultats d'expérimentation, mais l'alimentation de la base est manuelle et de ce fait longue. Nous proposons une solution pour extraire automatiquement des articles scientifiques les connaissances pertinentes pour la base de données, c'est-à-dire des résultats expérimentaux que nous représentons par une relation n-aire. La méthode procède en deux étapes : extraction automatique des documents et proposition de celles-ci pour validation ou modification par l'expert via une interface. Nous avons également proposé une méthode à base d'apprentissage automatique pour l'extraction et la classification de relations binaires en domaine de spécialité. Nous nous sommes intéressée aux caractéristiques et variétés d'expressions des relations, et à la prise en compte de ces caractéristiques dans un système à base d'apprentissage. Nous avons étudié la prise en compte de la structure syntaxique de la phrase et la simplification de phrases dirigée pour la tâche d'extraction de relations. Nous avons en particulier développé une méthode de simplification à base d'apprentissage automatique, qui utilise en cascade plusieurs classifieurs. / The amount of available scientific literature is constantly growing. If the experts of a domain want to easily access this information, it must be extracted and structured. To obtain structured data, both entities and relations of the texts must be detected. Our research is about the problem of complex relation extraction which represent experimental results, and detection and classification of binary relations between biomedical entities. We are interested in experimental results presented in scientific papers. An experimental result is a quantitative result obtained by an experimentation and linked with information that describes this experimentation. These results are important for biology experts, for example for doing modelization. In the domain of renal physiology, a database was created to centralize these experimental results, but the base is manually populated, therefore the population takes a long time. We propose a solution to automatically extract relevant knowledge for the database from the scientific papers, that is experimental results which are represented by a n-ary relation. The method proceeds in two steps: automatic extraction from documents and proposal of information extracted for approval or modification by the experts via an interface. We also proposed a method based on machine learning for extraction and classification of binary relations in specialized domains. We focused on the variations of the expression of relations, and how to represent them in a machine learning system. We studied the way to take into account syntactic structure of the sentence and the sentence simplification guided by the task of relation extraction. In particular, we developed a simplification method based on machine learning, which uses a series of classifiers. Extraction de relations Relation binaire Relation n-aire Domaine biomédical SVM Information syntaxique Simplification de phrases Relation extraction Binary relation N-ary relation Biomedical domain SVM Syntactic information Sentence simplification
29	Construção automática de redes bayesianas para extração de interações proteína-proteína a partir de textos biomédicos / Learning Bayesian networks for extraction of protein-protein interaction from biomedical articles Juárez, Pedro Nelson Shiguihara 20 June 2013 (has links) A extração de Interações Proteína-Proteína (IPPs) a partir de texto é um problema relevante na área biomédica e um desafio na área de aprendizado de máquina. Na área biomédica, as IPPs são fundamentais para compreender o funcionamento dos seres vivos. No entanto, o número de artigos relacionados com IPPs está aumentando rapidamente, sendo impraticável identicá-las e catalogá-las manualmente. Por exemplo, no caso das IPPs humanas apenas 10% foram catalogadas. Por outro lado, em aprendizado de máquina, métodos baseados em kernels são frequentemente empregados para extrair automaticamente IPPs, atingindo resultados considerados estado da arte. Esses métodos usam informações léxicas, sintáticas ou semânticas como características. Entretanto, os resultados ainda são insuficientes, atingindo uma taxa relativamente baixa, em termos da medida F, devido à complexidade do problema. Apesar dos esforços em produzir kernels, cada vez mais sofisticados, usando árvores sintáticas como árvores constituintes ou de dependência, pouco é conhecido sobre o desempenho de outras abordagens de aprendizado de máquina como, por exemplo, as redes bayesianas. As àrvores constituintes são estruturas de grafos que contêm informação importante da gramática subjacente as sentenças de textos contendo IPPs. Por outro lado, a rede bayesiana permite modelar algumas regras da gramática e atribuir para elas uma distribuição de probabilidade de acordo com as sentenças de treinamento. Neste trabalho de mestrado propõe-se um método para construção automática de redes bayesianas a partir de árvores contituintes para extração de IPPs. O método foi testado em cinco corpora padrões da extração de IPPs, atingindo resultados competitivos, em alguns casos melhores, em comparação a métodos do estado da arte / Extracting Protein-Protein Interactions (PPIs) from text is a relevant problem in the biomedical field and a challenge in the area of machine learning. In the biomedical field, the PPIs are fundamental to understand the functioning of living organisms. However, the number of articles related to PPIs is increasing rapidly, hence it is impractical to identify and catalog them manually. For example, in the case of human PPIs only 10 % have been cataloged. On the other hand, machine learning methods based on kernels are often employed to automatically extract PPIs, achieving state of the art results. These methods use lexical, syntactic and semantic information as features. However, the results are still poor, reaching a relatively low rate of F-measure due to the complexity of the problem. Despite efforts to produce sophisticate kernels, using syntactic trees as constituent or dependency trees, little is known about the performance of other Machine Learning approaches, eg, Bayesian networks. Constituent tree structures are graphs which contain important information of the underlying grammar in sentences containing PPIs. On the other hand, the Bayesian network allows modeling some rules of grammar and assign to them a probability distribution according to the training sentences. In this master thesis we propose a method for automatic construction of Bayesian networks from constituent trees for extracting PPIs. The method was tested in five corpora, considered benchmark of extraction of PPI, achieving competitive results, and in some cases better results when compared to state of the art methods Aprendizado de máquina Bayesian networks Extração de informação Extração de relação Information extraction Machine learning Protei-protein interaction extraction Redes bayesianas Relation extraction
30	Desenvolvimento de m?todo para consulta em linguagem natural de componentes de software / Development of method for natural language research of software components Domingues, Paulo Eduardo 28 June 2007 (has links) Made available in DSpace on 2016-04-04T18:31:20Z (GMT). No. of bitstreams: 1 Paulo Eduardo Domingues.pdf: 2694773 bytes, checksum: 8954b221ccf920e889584da2390badf6 (MD5) Previous issue date: 2007-06-28 / The development based on components allows to create inter-operable components, with well defined interfaces, reducing the complexity in the software development. In this scene, the library of software components plays an important role in corporate level, supporting documentation, specification, storage and recovery of components. Inside organizations, a components library supplies infrastructure for components lifecycle management. This work considers the storage and the recovery of components of software with the use of an interface in natural language. A method to generate a representation form is described, to be stored in the library, for the texts that describe the characteristics of the components that live in the library. The text of the research generated for the user also is represented of similar form to allow the comparison between the descriptions of the components of the library and the question of the user. Additionally the method is presented to determine the similarity between parts of the representations of the text of the characteristics with the text of the research, of form to return as resulted in sequence decreasing indication from priority the components that better take care of the research of the user. / O desenvolvimento baseado em componentes permite criar componentes inter-oper?veis, com interfaces bem definidas, reduzindo a complexidade no desenvolvimento de software. Neste cen?rio, a biblioteca de componentes de software exerce um papel importante em um ambiente corporativo, suportando a documenta??o, especifica??o, armazenamento e recupera??o de componentes. Dentro das organiza??es, uma biblioteca de componentes fornece uma infra-estrutura para o gerenciamento do ciclo de vida dos componentes. Este trabalho prop?e o armazenamento e a recupera??o de componentes de software com a utiliza??o de uma interface em linguagem natural. ? descrito um m?todo para gerar uma forma de representa??o, a ser armazenada na biblioteca, para os textos que descrevem as caracter?sticas dos componentes que integram a biblioteca. O texto da consulta gerada pelo usu?rio tamb?m ? representado de forma semelhante para permitir a compara??o entre as descri??es dos componentes da biblioteca e a quest?o do usu?rio. Adicionalmente, ? apresentado o m?todo para determinar a semelhan?a entre partes das representa??es do texto das caracter?sticas com o texto das consultas, de forma a retornar como resultado a indica??o em ordem decrescente de prioridade os componentes que melhor atendem a consulta do usu?rio. linguagem natural extra??o de rela??es biblioteca de componentes reuso de software natural language relation extraction components library software reuse

Search results