• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 81
  • 22
  • 20
  • 11
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 173
  • 122
  • 105
  • 62
  • 54
  • 49
  • 48
  • 47
  • 37
  • 32
  • 30
  • 27
  • 27
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Identificação da cobertura espacial de documentos usando mineração de textos / Identification of spatial coverage documents with mining

Vargas, Rosa Nathalie Portugal 08 August 2012 (has links)
Atualmente, é comum que usuários levem em consideração a localização geográfica dos documentos, é dizer considerar o escopo geográfico que está sendo tratado no contexto do documento, nos processos de Recuperação de Informação. No entanto, os sistemas convencionais de extração de informação que estão baseados em palavras-chave não consideram que as palavras podem representar entidades geográficas espacialmente relacionadas com outras entidades nos documentos. Para resolver esse problema, é necessário viabilizar o georreferenciamento dos textos, ou seja, identificar as entidades geográficas presentes e associá-las com sua correta localização espacial. A identificação e desambiguação das entidades geográficas apresenta desafios importantes, principalmente do ponto de vista linguístico, já que um topônimo, pode possuir variados tipos de ambiguidade associados. Esse problema de ambiguidade causa ruido nos processos de recuperação de informação, já que o mesmo termo pode ter informação relevante ou irrelevante associada. Assim, a principal estratégia para superar os problemas de ambiguidade, compreende a identificação de evidências que auxiliem na identificação e desambiguação das localidades nos textos. O presente trabalho propõe uma metodologia que permite identificar e determinar a cobertura espacial dos documentos, denominada SpatialCIM. A metodologia SpatialCIM tem o objetivo de organizar os processos de resolução de topônimos. Assim, o principal objetivo deste trabalho é avaliar e selecionar técnicas de desambiguação que permitam resolver a ambiguidade dos topônimos nos textos. Para isso, foram propostas e desenvolvidas as abordagens de (1)Desambiguação por Pontos e a (2)Desambiguação Textual e Estrutural. Essas abordagens, exploram duas técnicas diferentes de desambiguação de topônimos, as quais, geram e desambiguam os caminhos geográficos associados aos topônimos reconhecidos para cada documento. Assim, a hipótese desta pesquisa é que o uso das técnicas de desambiguação de topônimos viabilizam uma melhor localização espacial dos documentos. A partir dos resultados obtidos neste trabalho, foi possível demonstrar que as técnicas de desambiguação melhoram a precisão e revocação na classificação espacial dos documentos. Demonstrou-se também o impacto positivo do uso de uma ferramenta linguística no processo de reconhecimento das entidades geográficas. Assim, foi demostrada a utilidade dos processos de desambiguação para a obtenção da cobertura espacial dos documentos / Currently, it is usual that users take into account the geographical localization of the documents in the Information Retrieval process. However, the conventional information retrieval systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the documents. To solve this problem, it is necessary to enable the geo-referencing of texts by identifying the geographical entities present in text and associate them with their correct spatial location. The identification and disambiguation of the geographical entities present major challenges mainly from the linguistic point of view, since one location can have different types of associated ambiguity. The ambiguity problem causes noise in the process of information retrieval, since the same term may have relevant or irrelevant information associated. Thus, the main strategy to overcome these problems, include the identification of evidence to assist in the identification and disambiguation of locations in the texts. This study proposes a methodology that allows the identification and spatial localization of the documents, denominated SpatialCIM. The SpatialCIM methodology has the objective to organize the Topônym Resolution process. Therefore the main objective of this study is to evaluate and select disambiguation techniques that allow solving the toponym ambiguity in texts. Therefore, we proposed and developed the approaches of (1) Disambiguation for Points and (2) Textual and Structural Disambiguation. These approaches exploit two different techniques of toponym disambiguation, which generate and desambiguate the associated paths with the recognized geographical toponym for each document. Therefore the hypothesis is, that the use of the toponyms disambiguation techniques enable a better spatial localization of documents. From the results it was possible to demonstrate that the disambiguation techniques improve the precision and recall for the spatial classification of documents. The positive effect of using a linguistic tool for the process of geographical entities recognition was also demonstrated. Thus, it was proved the usefulness of the disambiguation process for obtaining a spatial coverage of the document
32

Topic Segmentation and Medical Named Entities Recognition for Pictorially Visualizing Health Record Summary System

Ruan, Wei 03 April 2019 (has links)
Medical Information Visualization makes optimized use of digitized data of medical records, e.g. Electronic Medical Record. This thesis is an extended work of Pictorial Information Visualization System (PIVS) developed by Yongji Jin (Jin, 2016) Jiaren Suo (Suo, 2017) which is a graphical visualization system by picturizing patient’s medical history summary depicting patients’ medical information in order to help patients and doctors to easily capture patients’ past and present conditions. The summary information has been manually entered into the interface where the information can be taken from clinical notes. This study proposes a methodology of automatically extracting medical information from patients’ clinical notes by using the techniques of Natural Language Processing in order to produce medical history summarization from past medical records. We develop a Named Entities Recognition system to extract the information of the medical imaging procedure (performance date, human body location, imaging results and so on) and medications (medication names, frequency and quantities) by applying the model of conditional random fields with three main features and others: word-based, part-of-speech, Metamap semantic features. Adding Metamap semantic features is a novel idea which raised the accuracy compared to previous studies. Our evaluation shows that our model has higher accuracy than others on medication extraction as a case study. For enhancing the accuracy of entities extraction, we also propose a methodology of Topic Segmentation to clinical notes using boundary detection by determining the difference of classification probabilities of subsequence sequences, which is different from the traditional Topic Segmentation approaches such as TextTiling, TopicTiling and Beeferman Statistical Model. With Topic Segmentation combined for Named Entities Extraction, we observed higher accuracy for medication extraction compared to the case without the segmentation. Finally, we also present a prototype of integrating our information extraction system with PIVS by simply building the database of interface coordinates and the terms of human body parts.
33

Authorship Attribution Through Words Surrounding Named Entities

Jacovino, Julia Maureen 03 April 2014 (has links)
In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorship / McAnulty College and Graduate School of Liberal Arts; / Computational Mathematics / MS; / Thesis;
34

On Travel Article Classification Based on Consumer Information Search Process Model

Hsiao, Yung-Lin 27 July 2011 (has links)
The information overload problem becomes imperative with the explosion of information, and people need some agents to facilitate them to filter the information to meet their personal need. In this work, we conduct a research for the article classification in the tourism domain so as to identify articles that meet users¡¦ information need. We propose an information need orientation model in tourism, which consists of four goals: Initiation, Attraction, Accommodation, and Route planning. These goals can be characterized by 13 features. Some of the identified features can be enhanced by WordNet and Named Entity Recognition techniques as supplement techniques. To test the effectiveness of using the 13 features for classification and the relevant methods, we collected 15,797 articles from TripAdvisor.com, the world's largest travel site, and randomly selected 600 articles as training data labeled by two labelers. The experimental results show that our approach generally has comparable or better performance than that of using purely lexical features, namely TF-IDF, for classification, with fewer features.
35

Feature identification framework and applications (FIFA)

Audenaert, Michael Neal 12 April 2006 (has links)
Large digital libraries typically contain large collections of heterogeneous resources intended to be delivered to a variety of user communities. One key challenge for these libraries is providing tight integration between resources both within a single collection and across the several collections of the library with out requiring hand coding. One key tool in doing this is elucidating the internal structure of the digital resources and using that structure to form connections between the resources. The heterogeneous nature of the collections and the diversity of the needs in the user communities complicates this task. Accordingly, in this thesis, I describe an approach to implementing a feature identification system to support digital collections that provides a general framework for applications while allowing decisions about the details of document representation and features identification to be deferred to domain specific implementations of that framework. These deferred decisions include details of the semantics and syntax of markup, the types of metadata to be attached to documents, the types of features to be identified, the feature identification algorithms to be applied, and which features should be indexed. This approach results in strong support for the general aspects of developing a feature identification system allowing future work to focus on the details of applying that system to the specific needs of individual collections and user communities.
36

Adaptive Forwarding in Named Data Networking

Yi, Cheng January 2014 (has links)
Named Data Networking (NDN) is a recently proposed new Internet architecture. By naming data instead of locations, it changes the very basic network service abstraction from "delivering packets to given destinations" to "retrieving data of given names." This fundamental change creates an abundance of new opportunities as well as many intellectual challenges in application development, network routing and forwarding, communication security and privacy. The focus of this dissertation is a unique feature introduced by NDN: its adaptive forwarding plane. Communication in NDN is done by exchanges of Interest and Data packets. Consumers send Interest packets to request desired Data, routers forward them based on data names, and producers answer with Data packets, which take the same path of Interests but in reverse direction. During this process, routers maintain state information of pending Interests. This state information, coupled with the symmetric exchange of Interest and Data, enables NDN routers to detect loops, observe data retrieval performance, and explore multiple forwarding paths, all at the forwarding plane. Since NDN is still in its early stage, however, none of these powerful features has been systematically designed, valuated, or explored. In this dissertation, we present a concrete design of NDN's forwarding plane to make the network resilient and efficient. First, we design the basic adaptation mechanism and evaluate its effectiveness in circumventing prefix hijack attacks. Second, we propose a novel NACK mechanism for fast failure detection and evaluate its benefits in handling network failures. We also show that a resilient forwarding plane makes routing more stable and more scalable. Third, we design a congestion control mechanism, Dynamic Interest Limiting, to adapt traffic rate in a hop-by-hop and multipath fashion, which is effective even with a large number of flows in a large network topology.
37

BioEve: User Interface Framework Bridging IE and IR

January 2010 (has links)
abstract: Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search/navigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm. / Dissertation/Thesis / M.S. Computer Science 2010
38

Reconnaissance des entités nommées par exploration de règles d'annotation : interpréter les marqueurs d'annotation comme instructions de structuration locale. / Named entity recognition by mining association rules

Nouvel, Damien 20 November 2012 (has links)
Le développement des technologies de l'information et de la communication à modifié en profondeur la manière dont nous avons accès aux connaissances. Face à l’afflux de données et à leur diversité, il est nécessaire de meure su point des technologies performantes et robustes pour y rechercher des informations. Notre travail porte sur le reconnaissance des entités nommées et leur annotation su sein de transcriptions d’émissions radiodiffusées ou télévisuelles. En première partie, nous abordons le problématique de la reconnaissance automatique des entités nommées. Après une caractérisation de leur nature linguistique, nous proposons une approche par instructions, fondée sur les marqueurs (balises) d’annotation, qui considère ces éléments isolément (début ou fin d’une annotation). En seconde partie, nous faisons état des travaux en fouille de données et présentons un cadre formel pour explorer les données. Nous y proposons une formulation alternative par segments, qui limite la combinatoire lors de l’exploration. Les motifs corrélés à un ou plusieurs marqueurs d’annotation sont extraits comme règles d’annotation. La dernière partie décrit le cadre expérimental, quelques spécificités de l’implémentation du système (mXS) et les résultats obtenus. Nous montrons l’intérêt d’extraire largement les règles d’annotation et expérimentons les motifs de segments. Nous fournissons des résultats chiffrés relatifs aux performances du système à divers point de vue et dans diverses configurations. Ils montrent que l’approche que nous proposons est compétitive et qu’elle ouvre des perspectives dans le cadre de l’observation des langues naturelles et de l’annotation automatique. / Those latest decades, the development of information end communication technologies has deeply modified die way we access knowledge. Facing the volume end the diversity of date, it is necessary to work out robust end efficient technologies to retrieve information. The present work considers recognition and annotation of Named Entities within radio and TV broadcasts transcripts. For this purpose, we interpret die annotation task es s local structuration. We can therefore leverage data to empirically extract mies that govern annotation markers (or tags) presence. In die first part, we introduce our problematic: processing named entities. We question named entities status (related notions, typologies, evaluation end annotation) and propose properties to define their linguistic nature. We conclude this part by describing state-of-the-art approaches end by presenting our contribution, focused on markers (tags) diet begin or end an annotation. In die second part, we present die formalism used to mine date. The framework we use to enrich date, explore sequences and extract annotation rules is formalized. The lest part describes the implemented system (mXS) and the obtained results. Specific implementation details are given and results about rule extraction from data are reported. Finally, we provide quantitative results of the performance of mXS on Ester2 end Etape datasets, among with various indications about die behaviour of die system from diverse points of view and in diverse configurations. They show diet our approach gives competitive results end that it opens up new perspectives for natural language processing and automatic annotation.
39

PDRM : a proactive data replication mechanism to improve content mobility support in NDN using location awareness

Lehmann, Matheus Brenner January 2017 (has links)
O problema de lidar com a mobilidade dos usuários existe desde que os dispositivos móveis se tornaram capazes de lidar com conteúdo multimídia e ainda é um dos desafios mais relevantes na área de redes de computadores. A arquitetura de Internet convencional é inadequada em lidar com um número cada vez maior de dispositivos móveis que estão tanto consumindo quanto produzindo conteúdo. Named Data Networking (NDN) é uma arquitetura de rede que pode potencialmente superar este desafio de mobilidade. Ela suporta a mobilidade do consumidor nativamente, mas não oferece o mesmo nível de suporte para a mobilidade de conteúdo. A mobilidade de conteúdo exige garantir que os consumidores consigam encontrar e recuperar o conteúdo desejado mesmo quando o produtor correspondente (ou o hospedeiro principal) não estiver disponível. Nesta tese, propomos o PDRM (Proactive Data Replication Mechanism), um mecanismo de replicação de dados proativo e consciente de localização, que aumenta a disponibilidade de conteúdo através da redundância de dados no contexto da arquitetura NDN. Ele explora os recursos disponíveis dos usuários finais na vizinhança para melhorar a disponibilidade de conteúdo, mesmo no caso da mobilidade do produtor. Ao longo da tese, discutimos o projeto do PDRM, avaliamos o impacto do número de provedores disponíveis na vizinhança e a capacidade de cache na rede em sua operação e comparamos seu desempenho com NDN padrão e duas propostas do estado-da-arte. A avaliação indica que o PDRM melhora o suporte à mobilidade de conteúdo devido ao uso de informações de popularidade dos objetos e recursos extras na vizinhança para ajudar a replicação pró-ativa. Os resultados mostram que o PDRM pode reduzir os tempos de download até 53,55%, o carregamento do produtor até 71,6%, o tráfego entre domínios até 46,5% e a sobrecarga gerada até 25% em comparação com NDN padrão e os demais mecanismos avaliados. / The problem of handling user mobility has been around since mobile devices became capable of handling multimedia content and is still one of the most relevant challenges in networking. The conventional Internet architecture is inadequate in dealing with an ever-growing number of mobile devices that are both consuming and producing content. Named Data Networking (NDN) is a network architecture that can potentially overcome this mobility challenge. It supports consumer mobility by design but fails to offer the same level of support for content mobility. Content mobility requires guaranteeing that consumers manage to find and retrieve desired content even when the corresponding producer (or primary host) is not available. In this thesis, we propose PDRM, a Proactive and locality-aware Data Replication Mechanism that increases content availability through data redundancy in the context of the NDN architecture. It explores available resources from end-users in the vicinity to improve content availability even in the case of producer mobility. Throughout the thesis, we discuss the design of PDRM, evaluate the impact of the number of available providers in the vicinity and in-network cache capacity on its operation, and compare its performance to Vanilla NDN and two state-of-the-art proposals. The evaluation indicates that PDRM improves content mobility support due to using object popularity information and spare resources in the vicinity to help the proactive replication. Results show that PDRM can reduce the download times up to 53.55%, producer load up to 71.6%, inter-domain traffic up to 46.5%, and generated overhead up to 25% compared to Vanilla NDN and other evaluated mechanisms.
40

PDRM : a proactive data replication mechanism to improve content mobility support in NDN using location awareness

Lehmann, Matheus Brenner January 2017 (has links)
O problema de lidar com a mobilidade dos usuários existe desde que os dispositivos móveis se tornaram capazes de lidar com conteúdo multimídia e ainda é um dos desafios mais relevantes na área de redes de computadores. A arquitetura de Internet convencional é inadequada em lidar com um número cada vez maior de dispositivos móveis que estão tanto consumindo quanto produzindo conteúdo. Named Data Networking (NDN) é uma arquitetura de rede que pode potencialmente superar este desafio de mobilidade. Ela suporta a mobilidade do consumidor nativamente, mas não oferece o mesmo nível de suporte para a mobilidade de conteúdo. A mobilidade de conteúdo exige garantir que os consumidores consigam encontrar e recuperar o conteúdo desejado mesmo quando o produtor correspondente (ou o hospedeiro principal) não estiver disponível. Nesta tese, propomos o PDRM (Proactive Data Replication Mechanism), um mecanismo de replicação de dados proativo e consciente de localização, que aumenta a disponibilidade de conteúdo através da redundância de dados no contexto da arquitetura NDN. Ele explora os recursos disponíveis dos usuários finais na vizinhança para melhorar a disponibilidade de conteúdo, mesmo no caso da mobilidade do produtor. Ao longo da tese, discutimos o projeto do PDRM, avaliamos o impacto do número de provedores disponíveis na vizinhança e a capacidade de cache na rede em sua operação e comparamos seu desempenho com NDN padrão e duas propostas do estado-da-arte. A avaliação indica que o PDRM melhora o suporte à mobilidade de conteúdo devido ao uso de informações de popularidade dos objetos e recursos extras na vizinhança para ajudar a replicação pró-ativa. Os resultados mostram que o PDRM pode reduzir os tempos de download até 53,55%, o carregamento do produtor até 71,6%, o tráfego entre domínios até 46,5% e a sobrecarga gerada até 25% em comparação com NDN padrão e os demais mecanismos avaliados. / The problem of handling user mobility has been around since mobile devices became capable of handling multimedia content and is still one of the most relevant challenges in networking. The conventional Internet architecture is inadequate in dealing with an ever-growing number of mobile devices that are both consuming and producing content. Named Data Networking (NDN) is a network architecture that can potentially overcome this mobility challenge. It supports consumer mobility by design but fails to offer the same level of support for content mobility. Content mobility requires guaranteeing that consumers manage to find and retrieve desired content even when the corresponding producer (or primary host) is not available. In this thesis, we propose PDRM, a Proactive and locality-aware Data Replication Mechanism that increases content availability through data redundancy in the context of the NDN architecture. It explores available resources from end-users in the vicinity to improve content availability even in the case of producer mobility. Throughout the thesis, we discuss the design of PDRM, evaluate the impact of the number of available providers in the vicinity and in-network cache capacity on its operation, and compare its performance to Vanilla NDN and two state-of-the-art proposals. The evaluation indicates that PDRM improves content mobility support due to using object popularity information and spare resources in the vicinity to help the proactive replication. Results show that PDRM can reduce the download times up to 53.55%, producer load up to 71.6%, inter-domain traffic up to 46.5%, and generated overhead up to 25% compared to Vanilla NDN and other evaluated mechanisms.

Page generated in 0.0425 seconds