Spelling suggestions: "subject:"named entities"" "subject:"famed entities""
11 |
Recherche de réponses précises à des questions médicales : le système de questions-réponses MEANS / Finding precise answers to medical questions : the question-answering system MEANSBen Abacha, Asma 28 June 2012 (has links)
La recherche de réponses précises à des questions formulées en langue naturelle renouvelle le champ de la recherche d’information. De nombreux travaux ont eu lieu sur la recherche de réponses à des questions factuelles en domaine ouvert. Moins de travaux ont porté sur la recherche de réponses en domaine de spécialité, en particulier dans le domaine médical ou biomédical. Plusieurs conditions différentes sont rencontrées en domaine de spécialité comme les lexiques et terminologies spécialisés, les types particuliers de questions, entités et relations du domaine ou les caractéristiques des documents ciblés. Dans une première partie, nous étudions les méthodes permettant d’analyser sémantiquement les questions posées par l’utilisateur ainsi que les textes utilisés pour trouver les réponses. Pour ce faire nous utilisons des méthodes hybrides pour deux tâches principales : (i) la reconnaissance des entités médicales et (ii) l’extraction de relations sémantiques. Ces méthodes combinent des règles et patrons construits manuellement, des connaissances du domaine et des techniques d’apprentissage statistique utilisant différents classifieurs. Ces méthodes hybrides, expérimentées sur différents corpus, permettent de pallier les inconvénients des deux types de méthodes d’extraction d’information, à savoir le manque de couverture potentiel des méthodes à base de règles et la dépendance aux données annotées des méthodes statistiques. Dans une seconde partie, nous étudions l’apport des technologies du web sémantique pour la portabilité et l’expressivité des systèmes de questions-réponses. Dans le cadre de notre approche, nous exploitons les technologies du web sémantique pour annoter les informations extraites en premier lieu et pour interroger sémantiquement ces annotations en second lieu. Enfin, nous présentons notre système de questions-réponses, appelé MEANS, qui utilise à la fois des techniques de TAL, des connaissances du domaine et les technologies du web sémantique pour répondre automatiquement aux questions médicales. / With the dramatic growth of digital information, finding precise answers to natural language questions is more and more essential for retrieving domain knowledge in real time. Many research works tackled answer retrieval for factual questions in open domain. Less works were performed for domain-specific question answering such as the medical domain. Compared to the open domain, several different conditions are met in the medical domain such as specialized vocabularies, specific types of questions, different kinds of domain entities and relations. Document characteristics are also a matter of importance, as, for example, clinical texts may tend to use a lot of technical abbreviations while forum pages may use long “approximate” terms. We focus on finding precise answers to natural language questions in the medical field. A key process for this task is to analyze the questions and the source documents semantically and to use standard formalisms to represent the obtained annotations. We propose a medical question-answering approach based on: (i) NLP methods combing domain knowledge, rule-based methods and statistical ones to extract relevant information from questions and documents and (ii) Semantic Web technologies to represent and interrogate the extracted information.
|
12 |
Elaboração textual via definição de entidades mencionadas e de perguntas relacionadas aos verbos em textos simplificados do português / Text elaboration through named entities definition and questions related to verbs in simplified portuguese textsAmancio, Marcelo Adriano 15 June 2011 (has links)
Esta pesquisa aborda o tema da Elaboração Textual para um público alvo que tem letramento nos níveis básicos e rudimentar, de acordo com a classificação do Indicador Nacional de Alfabetismo Funcional (INAF, 2009). A Elaboração Textual é definida como um conjunto de técnicas que acrescentam material redundante em textos, sendo tradicionalmente usadas a adição de definições, sinônimos, antônimos, ou qualquer informação externa com o objetivo de auxiliar na compreensão do texto. O objetivo deste projeto de mestrado foi a proposta de dois métodos originais de elaboração textual: (1) via definição das entidades mencionadas que aparecem em um texto e (2) via definições de perguntas elaboradas direcionadas aos verbos das orações de um texto. Para a primeira tarefa, usou-se um sistema de reconhecimento de entidades mencionadas da literatura, o Rembrandt, e definições curtas da enciclopédia Wikipédia, sendo este método incorporado no sistema Web FACILITA EDUCATIVO, uma das ferramentas desenvolvidas no projeto PorSimples. O método foi avaliado de forma preliminar com um pequeno grupo de leitores com baixo nível de letramento e a avaliação foi positiva, indicando que este auxílio facilitou a leitura dos usuários da avaliação. O método de geração de perguntas elaboradas aos verbos de uma oração é uma tarefa nova que foi definida, estudada, implementada e avaliada neste mestrado. A avaliação não foi realizada junto ao público alvo e sim com especialistas em processamento de língua natural que avaliaram positivamente o método e indicaram quais erros influenciam negativamente na qualidade das perguntas geradas automaticamente. Existem boas indicações de que os métodos de elaboração desenvolvidos podem ser úteis na melhoria da compreensão da leitura para o público alvo em questão, as pessoas com baixo nível de letramento / This research addresses the topic of Textual Elaboration for low-literacy readers, i.e. people at the rudimentary and basic literacy levels according to the National Indicator of Functional Literacy (INAF, 2009). Text Elaboration consists of a set of techniques that adds extra material in texts using, traditionally, definitions, synonyms, antonyms, or any external information to assist in text understanding. The main goal of this research was the proposal of two methods of Textual Elaboration: (1) the use of short definitions for Named Entities in texts and (2) assignment of wh-questions related to verbs in text. The first task used the Rembrandt named entity recognition system and short definitions of Wikipedia. It was implemented in PorSimples web Educational Facilita tool. This method was preliminarily evaluated with a small group of low-literacy readers. The evaluation results were positive, what indicates that the tool was useful for improving the text understanding. The assignment of wh-questions related to verbs task was defined, studied, implemented and assessed during this research. Its evaluation was conducted with NLP researches instead of with low-literacy readers. There are good evidences that the text elaboration methods and resources developed here are useful in helping text understanding for low-literacy readers
|
13 |
Elaboração textual via definição de entidades mencionadas e de perguntas relacionadas aos verbos em textos simplificados do português / Text elaboration through named entities definition and questions related to verbs in simplified portuguese textsMarcelo Adriano Amancio 15 June 2011 (has links)
Esta pesquisa aborda o tema da Elaboração Textual para um público alvo que tem letramento nos níveis básicos e rudimentar, de acordo com a classificação do Indicador Nacional de Alfabetismo Funcional (INAF, 2009). A Elaboração Textual é definida como um conjunto de técnicas que acrescentam material redundante em textos, sendo tradicionalmente usadas a adição de definições, sinônimos, antônimos, ou qualquer informação externa com o objetivo de auxiliar na compreensão do texto. O objetivo deste projeto de mestrado foi a proposta de dois métodos originais de elaboração textual: (1) via definição das entidades mencionadas que aparecem em um texto e (2) via definições de perguntas elaboradas direcionadas aos verbos das orações de um texto. Para a primeira tarefa, usou-se um sistema de reconhecimento de entidades mencionadas da literatura, o Rembrandt, e definições curtas da enciclopédia Wikipédia, sendo este método incorporado no sistema Web FACILITA EDUCATIVO, uma das ferramentas desenvolvidas no projeto PorSimples. O método foi avaliado de forma preliminar com um pequeno grupo de leitores com baixo nível de letramento e a avaliação foi positiva, indicando que este auxílio facilitou a leitura dos usuários da avaliação. O método de geração de perguntas elaboradas aos verbos de uma oração é uma tarefa nova que foi definida, estudada, implementada e avaliada neste mestrado. A avaliação não foi realizada junto ao público alvo e sim com especialistas em processamento de língua natural que avaliaram positivamente o método e indicaram quais erros influenciam negativamente na qualidade das perguntas geradas automaticamente. Existem boas indicações de que os métodos de elaboração desenvolvidos podem ser úteis na melhoria da compreensão da leitura para o público alvo em questão, as pessoas com baixo nível de letramento / This research addresses the topic of Textual Elaboration for low-literacy readers, i.e. people at the rudimentary and basic literacy levels according to the National Indicator of Functional Literacy (INAF, 2009). Text Elaboration consists of a set of techniques that adds extra material in texts using, traditionally, definitions, synonyms, antonyms, or any external information to assist in text understanding. The main goal of this research was the proposal of two methods of Textual Elaboration: (1) the use of short definitions for Named Entities in texts and (2) assignment of wh-questions related to verbs in text. The first task used the Rembrandt named entity recognition system and short definitions of Wikipedia. It was implemented in PorSimples web Educational Facilita tool. This method was preliminarily evaluated with a small group of low-literacy readers. The evaluation results were positive, what indicates that the tool was useful for improving the text understanding. The assignment of wh-questions related to verbs task was defined, studied, implemented and assessed during this research. Its evaluation was conducted with NLP researches instead of with low-literacy readers. There are good evidences that the text elaboration methods and resources developed here are useful in helping text understanding for low-literacy readers
|
14 |
La structuration dans les entités nommées / Structuration in named entitiesDupont, Yoann 23 November 2017 (has links)
La reconnaissance des entités nommées et une discipline cruciale du domaine du TAL. Elle sert à l'extraction de relations entre entités nommées, ce qui permet la construction d'une base de connaissance (Surdeanu and Ji, 2014), le résumé automatique (Nobata et al., 2002), etc... Nous nous intéressons ici aux phénomènes de structurations qui les entourent.Nous distinguons ici deux types d'éléments structurels dans une entité nommée. Les premiers sont des sous-chaînes récurrentes, que nous appelerons les affixes caractéristiques d'une entité nommée. Le second type d'éléments est les tokens ayant un fort pouvoir discriminant, appelés des tokens déclencheurs. Nous détaillerons l'algorithme que nous avons mis en place pour extraire les affixes caractéristiques, que nous comparerons à Morfessor (Creutz and Lagus, 2005b). Nous appliquerons ensuite notre méthode pour extraire les tokens déclencheurs, utilisés pour l'extraction d'entités nommées du Français et d'adresses postales.Une autre forme de structuration pour les entités nommées est de nature syntaxique, qui suit généralement une structure d'imbrications ou arborée. Nous proposons un type de cascade d'étiqueteurs linéaires qui n'avait jusqu'à présent jamais été utilisé pour la reconnaissance d'entités nommées, généralisant les approches précédentes qui ne sont capables de reconnaître des entités de profondeur finie ou ne pouvant modéliser certaines particularités des entités nommées structurées.Tout au long de cette thèse, nous comparons deux méthodes par apprentissage automatique, à savoir les CRF et les réseaux de neurones, dont nous présenterons les avantages et inconvénients de chacune des méthodes. / Named entity recognition is a crucial discipline of NLP. It is used to extract relations between named entities, which allows the construction of knowledge bases (Surdeanu and Ji, 2014), automatic summary (Nobata et al., 2002) and so on. Our interest in this thesis revolves around structuration phenomena that surround them.We distinguish here two kinds of structural elements in named entities. The first one are recurrent substrings, that we will call the caracteristic affixes of a named entity. The second type of element is tokens with a good discriminative power, which we call trigger tokens of named entities. We will explain here the algorithm we provided to extract such affixes, which we will compare to Morfessor (Creutz and Lagus, 2005b). We will then apply the same algorithm to extract trigger tokens, which we will use for French named entity recognition and postal address extraction.Another form of structuration for named entities is of a syntactic nature. It follows an overlapping or tree structure. We propose a novel kind of linear tagger cascade which have not been used before for structured named entity recognition, generalising other previous methods that are only able to recognise named entities of a fixed depth or being unable to model certain characteristics of the structure. Ours, however, can do both.Throughout this thesis, we compare two machine learning methods, CRFs and neural networks, for which we will compare respective advantages and drawbacks.
|
15 |
Aprendizado semissupervisionado através de técnicas de acoplamentoDuarte, Maisa Cristina 17 February 2011 (has links)
Made available in DSpace on 2016-06-02T19:05:51Z (GMT). No. of bitstreams: 1
3777.pdf: 3225691 bytes, checksum: 38e3ba8f3c842f4e05d42710339e897a (MD5)
Previous issue date: 2011-02-17 / Machine Learning (ML) can be seen as research area within the Artificial Intelligence (AI) that aims to develop computer programs that can evolve with new experiences. The main ML purpose is the search for methods and techniques that enable the computer system improve its performance autonomously using information learned through its use. This feature can be considered the fundamental mechanisms of the processes of automatic learning. The main goal in this research project was to investigate, propose and implement methods and algorithms to allow the construction of a continuous learning system capable of extracting knowledge from the Web in Portuguese, throughout the creation of a knowledge base which can be constantly updated as new knowledge is extracted. / O Aprendizado de Máquina (AM) pode ser visto como uma área de pesquisa dentro da Inteligência Artificial (IA) que busca o desenvolvimento de programas de computador que possam evoluir à medida que vão sendo expostos a novas experiências. O principal objetivo de AM é a busca por métodos e técnicas que permitem a concepção de sistemas computacionais capazes de melhorar seu desempenho, de maneira autônoma, usando informações obtidas ao longo de seu uso; tal característica pode, de certa forma, ser considerada como um dos mecanismos fundamentais que regem os processos de aprendizado automático. O principal objetivo da pesquisa descrita neste documento foi investigar, propor e implementar métodos e algoritmos que permitissem a construção de um sistema computacional de aprendizado contínuo capaz de realizar a extração de conhecimento a partir da Web em português, por meio da criação de uma base de conhecimento atualizada constantemente à medida que novos conhecimentos vão sendo extraídos.
|
16 |
Pseudonymizace textových datových kolekcí pro strojové učení / De-identification of text data collections for machine learningMareš, Martin January 2021 (has links)
Text data collections enable the deployment of artificial intelligence algorithms for novel tasks. Such collections often contain miscellaneous personal data and other sensitive information that complicates sharing and further processing due to the personal data protection requirements. Searching for personal data is often carried out by sequential passes through the complete text. The objective of this thesis is to create a tool that helps the annotators decrease the risk of data leaks from the text collections. The tool utilizes pseudonymization (replacing a word with a different word, based on a set of rules). During the annotation process, the tool tags the words as "public", "private" and "candidate". The task of the annotator is to determine the role of the candidate words and detect any other untagged private information. The private words then become the subject of the pseudonymization process. The auto-tagging tool utilizes a named entity recognizer and a database of rules. The database is automatically improved based on the decisions of the annotator. Different named entity recognizers were compared for the purpose of personal data search on the collection of the ELITR project. During the comparison, a method was found which increased the sensitivity of the named entities detection which also...
|
17 |
Pojmenované entity a ontologie metodami hlubokého učení / Pojmenované entity a ontologie metodami hlubokého učeníRafaj, Filip January 2021 (has links)
In this master thesis we describe a method for linking named entities in a given text to a knowledge base - Named Entity Linking. Using a deep neural architecture together with BERT contextualized word embeddings we created a semi-supervised model that jointly performs Named Entity Recognition and Named Entity Disambiguation. The model outputs a Wikipedia ID for each entity detected in an input text. To compute contextualized word embeddings we used pre-trained BERT without making any changes to it (no fine-tuning). We experimented with components of our model and various versions of BERT embeddings. Moreover, we tested several different ways of using the contextual embeddings. Our model is evaluated using standard metrics and surpasses scores of models that were establishing the state of the art before the expansion of pre-trained contextualized models. The scores of our model are comparable to current state-of-the-art models.
|
18 |
Extrakce vztahů mezi entitami / Entity Relationship ExtractionŠimečková, Zuzana January 2020 (has links)
Relationship extraction is the task of extracting semantic relationships between en- tities from a text. We create a Czech Relationship Extraction Dataset (CERED) using distant supervision on Wikidata and Czech Wikipedia. We detail the methodology we used and the pitfalls we encountered. Then we use CERED to fine-tune a neural network model for relationship extraction. We base our model on BERT - a linguistic model pre-trained on extensive unlabeled data. We demonstrate that our model performs well on existing English relationship datasets (Semeval 2010 Task 8, TACRED) and report the results we achieved on CERED. 1
|
19 |
Créer un corpus annoté en entités nommées avec Wikipédia et WikiData : de mauvais résultats et du potentielPagès, Lucas 04 1900 (has links)
Ce mémoire explore l'utilisation conjointe de WikiData et de Wikipédia pour créer une ressource d'entités nommées (NER) annotée : DataNER. Il fait suite aux travaux ayant utilisé les bases de connaissance Freebase et DBpedia et tente de les remplacer avec WikiData, une base de connaissances collaborative dont la croissance continue est garantie par une communauté active. Malheureusement, les résultats du processus proposé dans ce mémoire ne sont pas à la hauteur des attentes initiales.
Ce document décrit dans un premier temps la façon dont on construit DataNER. L'utilisation des ancres de Wikipédia permet d'identifier un grand nombre d'entités nommées dans la ressource et le programme NECKAr permet de les classifier parmi les classes LOC, PER, ORG et MISC en utilisant WikiData. On décrit de ce fait les détails de ce processus, dont la façon dont on utilise les données de Wikipédia et WikiData afin de produire de nouvelles entités nommées et comment calibrer les paramètres du processus de création de DataNER.
Dans un second temps, on compare DataNER à d'autres ressources similaires en utilisant des modèles de NER ainsi qu'avec des comparaisons manuelles. Ces comparaisons nous permettent de mettre en valeur différentes raisons pour lesquelles les données de DataNER ne sont pas d'aussi bonne qualité que celles de ces autres ressources.
On conclut de ce fait sur des pistes d'améliorations de DataNER ainsi que sur un commentaire sur le travail effectué, tout en insistant sur le potentiel de cette méthode de création de corpus. / This master's thesis explores the joint use of WikiData and Wikipedia to make an annotated named entities (NER) corpus : DataNER. It follows papers which have used the knowledge bases DBpedia and Freebase and attempts at replacing them with WikiData, a collaborative knowledge base with an active community guaranteeing its continuous growth. Unfortunately, the results of the process described in this thesis did not reach our initial expectations.
This document first describes the way in which we build DataNER. The use of Wikipedia anchors enable us to identify a significant quantity of named entities in the resource and the NECKAr toolkit labels them with classes LOC, PER, ORG and MISC using WikiData. Thus, we describe the details of the corpus making process, including the way in which we infer more named entities thanks to Wikipedia and WikiData, as well as how we calibrate the making of DataNER with all the information at our availability.
Secondly, we compare DataNER with other similar corpora using models trained on each of them, as well as manual comparisons. Those comparisons enable us to identify different reasons why the quality of DataNER does not match the one of those other corpora.
We conclude by giving ideas as to how to enhance the quality of DataNER, giving a more personal comment of the work that has been accomplished and insisting on the potential of using Wikipedia and WikiData to automatically create a corpus.
|
20 |
Zpracování turkických jazyků / Processing of Turkic LanguagesCiddi, Sibel January 2014 (has links)
Title: Processing of Turkic Languages Author: Sibel Ciddi Department: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Supervisor: RNDr. Daniel Zeman, Ph.D. Abstract: This thesis presents several methods for the morpholog- ical processing of Turkic languages, such as Turkish, which pose a specific set of challenges for natural language processing. In order to alleviate the problems with lack of large language resources, it makes the data sets used for morphological processing and expansion of lex- icons publicly available for further use by researchers. Data sparsity, caused by highly productive and agglutinative morphology in Turkish, imposes difficulties in processing of Turkish text, especially for meth- ods using purely statistical natural language processing. Therefore, we evaluated a publicly available rule-based morphological analyzer, TRmorph, based on finite state methods and technologies. In order to enhance the efficiency of this analyzer, we worked on expansion of lexicons, by employing heuristics-based methods for the extraction of named entities and multi-word expressions. Furthermore, as a prepro- cessing step, we introduced a dictionary-based recognition method for tokenization of multi-word expressions. This method complements...
|
Page generated in 0.0759 seconds