• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 81
  • 22
  • 20
  • 11
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 173
  • 122
  • 105
  • 62
  • 54
  • 49
  • 48
  • 47
  • 37
  • 32
  • 30
  • 27
  • 27
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Identificação da cobertura espacial de documentos usando mineração de textos / Identification of spatial coverage documents with mining

Rosa Nathalie Portugal Vargas 08 August 2012 (has links)
Atualmente, é comum que usuários levem em consideração a localização geográfica dos documentos, é dizer considerar o escopo geográfico que está sendo tratado no contexto do documento, nos processos de Recuperação de Informação. No entanto, os sistemas convencionais de extração de informação que estão baseados em palavras-chave não consideram que as palavras podem representar entidades geográficas espacialmente relacionadas com outras entidades nos documentos. Para resolver esse problema, é necessário viabilizar o georreferenciamento dos textos, ou seja, identificar as entidades geográficas presentes e associá-las com sua correta localização espacial. A identificação e desambiguação das entidades geográficas apresenta desafios importantes, principalmente do ponto de vista linguístico, já que um topônimo, pode possuir variados tipos de ambiguidade associados. Esse problema de ambiguidade causa ruido nos processos de recuperação de informação, já que o mesmo termo pode ter informação relevante ou irrelevante associada. Assim, a principal estratégia para superar os problemas de ambiguidade, compreende a identificação de evidências que auxiliem na identificação e desambiguação das localidades nos textos. O presente trabalho propõe uma metodologia que permite identificar e determinar a cobertura espacial dos documentos, denominada SpatialCIM. A metodologia SpatialCIM tem o objetivo de organizar os processos de resolução de topônimos. Assim, o principal objetivo deste trabalho é avaliar e selecionar técnicas de desambiguação que permitam resolver a ambiguidade dos topônimos nos textos. Para isso, foram propostas e desenvolvidas as abordagens de (1)Desambiguação por Pontos e a (2)Desambiguação Textual e Estrutural. Essas abordagens, exploram duas técnicas diferentes de desambiguação de topônimos, as quais, geram e desambiguam os caminhos geográficos associados aos topônimos reconhecidos para cada documento. Assim, a hipótese desta pesquisa é que o uso das técnicas de desambiguação de topônimos viabilizam uma melhor localização espacial dos documentos. A partir dos resultados obtidos neste trabalho, foi possível demonstrar que as técnicas de desambiguação melhoram a precisão e revocação na classificação espacial dos documentos. Demonstrou-se também o impacto positivo do uso de uma ferramenta linguística no processo de reconhecimento das entidades geográficas. Assim, foi demostrada a utilidade dos processos de desambiguação para a obtenção da cobertura espacial dos documentos / Currently, it is usual that users take into account the geographical localization of the documents in the Information Retrieval process. However, the conventional information retrieval systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the documents. To solve this problem, it is necessary to enable the geo-referencing of texts by identifying the geographical entities present in text and associate them with their correct spatial location. The identification and disambiguation of the geographical entities present major challenges mainly from the linguistic point of view, since one location can have different types of associated ambiguity. The ambiguity problem causes noise in the process of information retrieval, since the same term may have relevant or irrelevant information associated. Thus, the main strategy to overcome these problems, include the identification of evidence to assist in the identification and disambiguation of locations in the texts. This study proposes a methodology that allows the identification and spatial localization of the documents, denominated SpatialCIM. The SpatialCIM methodology has the objective to organize the Topônym Resolution process. Therefore the main objective of this study is to evaluate and select disambiguation techniques that allow solving the toponym ambiguity in texts. Therefore, we proposed and developed the approaches of (1) Disambiguation for Points and (2) Textual and Structural Disambiguation. These approaches exploit two different techniques of toponym disambiguation, which generate and desambiguate the associated paths with the recognized geographical toponym for each document. Therefore the hypothesis is, that the use of the toponyms disambiguation techniques enable a better spatial localization of documents. From the results it was possible to demonstrate that the disambiguation techniques improve the precision and recall for the spatial classification of documents. The positive effect of using a linguistic tool for the process of geographical entities recognition was also demonstrated. Thus, it was proved the usefulness of the disambiguation process for obtaining a spatial coverage of the document
42

Extracting social networks from fiction : Imaginary and invisible friends: Investigating the social world of imaginary friends.

Ek, Adam January 2017 (has links)
This thesis develops an approach to extract the social relation between characters in literary text to create a social network. The approach uses co-occurrences of named entities, keywords associated with the named entities, and the dependency relations that exist between the named entities to construct the network. Literary texts contain a large amount of pronouns to represent the named entities, to resolve the antecedents of pronouns, a pronoun resolution system is implemented based on a standard pronoun resolution algorithm. The results indicate that the pronoun resolution system finds the correct named entity in 60,4\% of all cases. The social network is evaluated by comparing character importance rankings based on graph properties with an independently human generated importance rankings. The generated social networks correlate moderately to strongly with the independent character ranking.
43

CUILESS2016: a clinical corpus applying compositional normalization of text mentions

Osborne, John D., Neu, Matthew B., Danila, Maria I., Solorio, Thamar, Bethard, Steven J. 10 January 2018 (has links)
Background: Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. Methods: We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. Results: We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Conclusion: Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text.
44

Acquisition de relations entre entités nommées à partir de corpus / Corpus-based recognition of relations between named entities

Ezzat, Mani 06 May 2014 (has links)
Les entités nommées ont été l’objet de nombreuses études durant les années 1990. Leur reconnaissance dans les textes a atteint un niveau de maturité suffisante, du moins pour les principaux types (personne, organisation et lieu), pour aller plus loin dans l’analyse, vers la reconnaissance de relations entre entités. Il est par exemple intéressant de savoir qu’un texte contient des occurrences des mots « Google » et « Youtube » ; mais l’analyse devient plus intéressante si le système est capable de détecter une relation entre ces deux éléments, voire de les typer comme étant une relation d’achat (Google ayant racheté Youtube en 2006). Notre contribution s’articule autour de deux grands axes : tracer un contour plus précis autour de la définition de la relation entre entités nommées, notamment au regard de la linguistique, et explorer des techniques pour l’élaboration de systèmes d’extraction automatique qui sollicitent des linguistes. / Named entities have been the topic of many researches during the 90’s. Their detection in texts has reached a high level of performance, at least for the main categories (person, organization and location). It becomes now possible to go further, toward relation between entities recognition. For instance, knowing that a text contains the words “Google” and “Youtube” can be relevant but being able to link them and detect an acquisition relation can be more interesting (Google has bought Youtube in 2006). Our work is focusing on two different aspects: to define a finer perimeter around the relation between named entities definition, with linguistic aspect in mind, and to explore new techniques that make use of linguists in order to build a relation between named entities recognition system.
45

Named Data Networking in Local Area Networks

Shi, Junxiao, Shi, Junxiao January 2017 (has links)
The Named Data Networking (NDN) is a new Internet architecture that changes the network semantic from packet delivery to content retrieval and promises benefits in areas such as content distribution, security, mobility support, and application development. While the basic NDN architecture applies to any network environment, local area networks (LANs) are of particular interest because of their prevalence on the Internet and the relatively low barrier to deployment. In this dissertation, I design NDN protocols and implement NDN software, to make NDN communication in LAN robust and efficient. My contributions include: (a) a forwarding behavior specification required on every NDN node; (b) a secure and efficient self-learning strategy for switched Ethernet, which discovers available contents via occasional flooding, so that the network can operate without manual configuration, and does not require a routing protocol or a centralized controller; (c) NDN-NIC, a network interface card that performs name-based packet filtering, to reduce CPU overhead and power consumption of the main system during broadcast communication on shared media; (d) the NDN Link Protocol (NDNLP), which allows the forwarding plane to add hop-by-hop headers, and provides a fragmentation-reassembly feature so that large NDN packets can be sent directly over Ethernet with limited MTU.
46

Information extraction from pharmaceutical literature

Batista-Navarro, Riza Theresa Bautista January 2014 (has links)
With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods.
47

Extrakce informací z textu

Michalko, Boris January 2008 (has links)
Cieľom tejto práce je preskúmať dostupné systémy pre extrakciu informácií a možnosti ich použitia v projekte MedIEQ. Teoretickú časť obsahuje úvod do oblasti extrakcie informácií. Popisujem účel, potreby a použitie a vzťah k iným úlohám spracovania prirodzeného jazyka. Prechádzam históriou, nedávnym vývojom, meraním výkonnosti a jeho kritikou. Taktiež popisujem všeobecnú architektúru IE systému a základné úlohy, ktoré má riešiť, s dôrazom na extrakciu entít. V praktickej časti sa nacházda prehľad algoritmov používaných v systémoch pre extrakciu informácií. Opisujem oba typy algoritmov ? pravidlové aj štatistické. V ďalšej kapitole je zoznam a krátky popis existujúcich voľných systémov. Nakoniec robím vlastný experiment s dvomi systémami ? LingPipe a GATE na vybraných korpusoch. Meriam rôzne výkonnostné štatistiky. Taktiež som vytvoril malý slovník a regulárny výraz pre email aby som demonštroval taktiež pravidlá pre extrahovanie určitých špecifických informácií.
48

PDRM : a proactive data replication mechanism to improve content mobility support in NDN using location awareness

Lehmann, Matheus Brenner January 2017 (has links)
O problema de lidar com a mobilidade dos usuários existe desde que os dispositivos móveis se tornaram capazes de lidar com conteúdo multimídia e ainda é um dos desafios mais relevantes na área de redes de computadores. A arquitetura de Internet convencional é inadequada em lidar com um número cada vez maior de dispositivos móveis que estão tanto consumindo quanto produzindo conteúdo. Named Data Networking (NDN) é uma arquitetura de rede que pode potencialmente superar este desafio de mobilidade. Ela suporta a mobilidade do consumidor nativamente, mas não oferece o mesmo nível de suporte para a mobilidade de conteúdo. A mobilidade de conteúdo exige garantir que os consumidores consigam encontrar e recuperar o conteúdo desejado mesmo quando o produtor correspondente (ou o hospedeiro principal) não estiver disponível. Nesta tese, propomos o PDRM (Proactive Data Replication Mechanism), um mecanismo de replicação de dados proativo e consciente de localização, que aumenta a disponibilidade de conteúdo através da redundância de dados no contexto da arquitetura NDN. Ele explora os recursos disponíveis dos usuários finais na vizinhança para melhorar a disponibilidade de conteúdo, mesmo no caso da mobilidade do produtor. Ao longo da tese, discutimos o projeto do PDRM, avaliamos o impacto do número de provedores disponíveis na vizinhança e a capacidade de cache na rede em sua operação e comparamos seu desempenho com NDN padrão e duas propostas do estado-da-arte. A avaliação indica que o PDRM melhora o suporte à mobilidade de conteúdo devido ao uso de informações de popularidade dos objetos e recursos extras na vizinhança para ajudar a replicação pró-ativa. Os resultados mostram que o PDRM pode reduzir os tempos de download até 53,55%, o carregamento do produtor até 71,6%, o tráfego entre domínios até 46,5% e a sobrecarga gerada até 25% em comparação com NDN padrão e os demais mecanismos avaliados. / The problem of handling user mobility has been around since mobile devices became capable of handling multimedia content and is still one of the most relevant challenges in networking. The conventional Internet architecture is inadequate in dealing with an ever-growing number of mobile devices that are both consuming and producing content. Named Data Networking (NDN) is a network architecture that can potentially overcome this mobility challenge. It supports consumer mobility by design but fails to offer the same level of support for content mobility. Content mobility requires guaranteeing that consumers manage to find and retrieve desired content even when the corresponding producer (or primary host) is not available. In this thesis, we propose PDRM, a Proactive and locality-aware Data Replication Mechanism that increases content availability through data redundancy in the context of the NDN architecture. It explores available resources from end-users in the vicinity to improve content availability even in the case of producer mobility. Throughout the thesis, we discuss the design of PDRM, evaluate the impact of the number of available providers in the vicinity and in-network cache capacity on its operation, and compare its performance to Vanilla NDN and two state-of-the-art proposals. The evaluation indicates that PDRM improves content mobility support due to using object popularity information and spare resources in the vicinity to help the proactive replication. Results show that PDRM can reduce the download times up to 53.55%, producer load up to 71.6%, inter-domain traffic up to 46.5%, and generated overhead up to 25% compared to Vanilla NDN and other evaluated mechanisms.
49

Rozpoznávání pojmenovaných entit v biomedicínské doméně / Named entity recognition in the biomedical domain

Williams, Shadasha January 2021 (has links)
Thesis Title: Named Entity Recognition in the Biomedical Domain Named entity recognition (NER) is the task of information extraction that attempts to recognize and extract particular entities in a text. One of the issues that stems from NER is that its models are domain specific. The goal of the thesis is to focus on entities strictly from the biomedical domain. The other issue with NER comes the synonymous terms that may be linked to one entity, moreover they lead to issue of disambiguation of the entities. Due to the popularity of neural networks and their success in NLP tasks, the work should use a neural network architecture for the task of named entity disambiguation, which is described in the paper by Eshel et al [1]. One of the subtasks of the thesis is to map the words and entities to a vector space using word embeddings, which attempts to provide textual context similarity, and coherence [2]. The main output of the thesis will be a model that attempts to disambiguate entities of the biomedical domain, using scientific journals (PubMed and Embase) as the documents of our interest.
50

Künstliche neuronale Netze zur Verarbeitung natürlicher Sprache

Dittrich, Felix 21 April 2021 (has links)
An der Verarbeitung natürlicher Sprache durch computerbasierte Systeme wurde immer aktiv entwickelt und geforscht, um Aufgaben in den am weitesten verbreiteten Sprachen zu lösen. In dieser Arbeit werden verschiedene Ansätze zur Lösung von Problemen in diesem Bereich mittels künstlicher neuronaler Netze beschrieben. Dabei konzentriert sich diese Arbeit hauptsächlich auf modernere Architekturen wie Transformatoren oder BERT. Ziel dabei ist es, diese besser zu verstehen und herauszufinden, welche Vorteile sie gegenüber herkömmlichen künstlichen neuronalen Netzwerken haben. Anschließend wird dieses erlangte Wissen an einer Aufgabe aus dem Bereich der Verarbeitung natürlicher Sprache getestet, in welcher mittels einer sogenannten Named Entity Recognition (NER) spezielle Informationen aus Texten extrahiert werden.:1 Einleitung 1.1 Verarbeitung natürlicher Sprache (NLP) 1.2 Neuronale Netze 1.2.1 Biologischer Hintergrund 1.3 Aufbau der Arbeit 2 Grundlagen 2.1 Künstliche neuronale Netze 2.1.1 Arten des Lernens 2.1.2 Aktivierungsfunktionen 2.1.3 Verlustfunktionen 2.1.4 Optimierer 2.1.5 Über- und Unteranpassung 2.1.6 Explodierender und verschwindender Gradient 2.1.7 Optimierungsverfahren 3 Netzwerkarchitekturen zur Verarbeitung natürlicher Sprache 3.1 Rekurrente neuronale Netze (RNN) 3.1.1 Langes Kurzzeitgedächtnis (LSTM) 3.2 Autoencoder 3.3 Transformator 3.3.1 Worteinbettungen 3.3.2 Positionscodierung 3.3.3 Encoderblock 3.3.4 Decoderblock 3.3.5 Grenzen Transformatorarchitektur 3.4 Bidirektionale Encoder-Darstellungen von Transformatoren (BERT) 3.4.1 Vortraining 3.4.2 Feinabstimmung 4 Praktischer Teil und Ergebnisse 4.1 Aufgabe 4.2 Verwendete Bibliotheken, Programmiersprachen und Software 4.2.1 Python 4.2.2 NumPy 4.2.3 pandas 4.2.4 scikit-learn 4.2.5 Tensorflow 4.2.6 Keras 4.2.7 ktrain 4.2.8 Data Version Control (dvc) 4.2.9 FastAPI 4.2.10 Docker 4.2.11 Amazon Web Services 4.3 Daten 4.4 Netzwerkarchitektur 4.5 Training 4.6 Auswertung 4.7 Implementierung 5 Schlussbemerkungen 5.1 Zusammenfassung und Ausblick

Page generated in 0.0457 seconds