Spelling suggestions: "subject:"named"" "subject:"famed""
71 |
Protocole de routage pour l’architecture NDN / Routing protocol for NDN architectureAubry, Elian 19 December 2017 (has links)
Parmi les architectures orientées contenu, l'architecture NDN (Named-Data Networking) a su agréger la plus importante communauté de chercheurs et est la plus aboutie pour un Internet du futur. Dans le cadre de l'architecture NDN, au cours de ce doctorat, nous nous sommes concentrés sur les mécanismes de routage adaptés à cette nouvelle vision du réseau. En effet, la capacité à acheminer une requête vers la destination est fondamentale pour qu'une architecture réseau soit fonctionnelle et cette problématique avait été très peu étudiée jusqu'alors. Ainsi, dans ce manuscrit, nous proposons le protocole de routage SRSC (SDN-based Routing Scheme for CCN/NDN), qui repose sur l'utilisation du paradigme des réseaux logiciels (Software-Defined Networks\\, SDN). SRSC utilise un contrôleur capable de gérer le plan de contrôle du réseau NDN. En centralisant l'ensemble des informations telles que la topologie du réseau, la localisation des différents contenus et le contenu des mémoires cache des nœuds du réseau, le contrôleur va pouvoir établir la meilleure route pour acheminer les requêtes vers le contenu. SRSC permet également un routage de type anycast, c'est à dire qu'il permet d'acheminer les requêtes vers le nœud le plus proche qui dispose des données, permettant d'optimiser la distribution des requêtes dans le réseau et de répartir la charge parmi tous les nœuds. De plus, SRSC utilise uniquement les messages Interest et Data de l'architecture NDN et tient son originalité du fait qu'il s'affranchit complètement de l'infrastructure TCP/IP existante. Dans un premier temps, SRSC a été évalué via simulation avec le logiciel NS-3 où nous l'avons comparé à la méthode d'inondation des requêtes, appelée flooding, initialement proposée par NDN. SRSC a ensuite été implanté dans NDNx, l'implantation open source de l'architecture NDN, puis déployé sur notre testbed utilisant la technologie Docker. Ce testbed permet de virtualiser des nœuds NDN et d'observer un réel déploiement de cette architecture réseau à large échelle. Nous avons ainsi évalué les performances de notre protocole SRSC sur notre testbed virtualisé et nous l'avons comparé au protocole NLSR, (Named-Data Link State Routing Protocol), le protocole de routage du projet NDN / Internet is a mondial content network and its use grows since several years. Content delivery such as P2P or video streaming generates the main part of the Internet traffic and Named Data Networks (NDN) appear as an appropriate architecture to satisfy the user needs. Named-Data Networking is a novel clean-slate architecture for Future Internet. It has been designed to deliver content at large scale and integrates several features such as in-network caching, security, multi-path. However, the lack of scalable routing scheme is one of the main obstacles that slow down a large deployment of NDN at an Internet-scale. As it relies on content names instead of host address, it cannot reuse the traditional routing scheme on the Internet. In this thesis, we propose to use the Software-Defined Networking (SDN) paradigm to decouple data plane and control plane and present SRSC, a new routing scheme for NDN based on SDN paradigm. Our solution is a clean-slate approach, using only NDN messages and the SDN paradigm. We implemented our solution into the NS-3 simulator and perform extensive simulations of our proposal. SRSC show better performances than the flooding scheme used by default in NDN. We also present a new NDN testbed and the implementation of our protocol SRSC, a Controlled-based Routing Scheme for NDN. We implemented SRSC into NDNx, the NDN implementation, and deployed it into a virtual environment through Docker. Our experiments demonstrate the ability of our proposal to forward Interest, while keeping a low computation time for the Controller and low delay to access Content. Moreover, we propose a solution to easily deploy and evaluate NDN network, and we compare SRSC with NLSR, the current routing protocol used in NDNx
|
72 |
De A streetcar named desire a Um bonde chamado desejo: uma análise sob o enfoque da linguística sistêmico-funcionalSilveira, Gustavo Cardoso 29 May 2018 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2018-07-25T11:56:00Z
No. of bitstreams: 1
Gustavo Cardoso Silveira.pdf: 1062366 bytes, checksum: c0b06153cd7033e39be20ab8e3fcbbb5 (MD5) / Made available in DSpace on 2018-07-25T11:56:00Z (GMT). No. of bitstreams: 1
Gustavo Cardoso Silveira.pdf: 1062366 bytes, checksum: c0b06153cd7033e39be20ab8e3fcbbb5 (MD5)
Previous issue date: 2018-05-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The purpose of this master's dissertation is to compare the English-language original
of A Streetcar Named Desire, by Tennessee Williams, with the respective translation
in Portuguese, Um bonde chamado Desejo, by Vadim Nikitin, in order to characterize
the differences between the two versions based on the lexicographic choices made by
these authors. Since the 1950s, the important work of linguistic-based translation
scholars has done much to break the boundaries between different disciplines
dedicated to it, and to draw their studies from a position of possible confrontation. The
research has the support of Systemic-Functional Linguistics (SFL), a theoreticalmethodological
proposal of Halliday (1985) and Halliday and Matthiessen (2004). The
SFL states that the use of language is functional; that its function is to construct
meanings; that meanings are influenced by the social and cultural context in which
they are exchanged; and that the process of language in use is a semiotic process, a
process of making meaning through choices. Researches show that this theoretical
framework can be applied to the field of translation studies from several aspects
involved in SFL: the transitivity system, the modality and the evaluation, as well as the
notion of thematic structure. Other contributions help to understand the characteristics
that mark a translation, such as the notions of linguistic determinism and relativity, as
well as the question of linguistic typology. The present study seeks to answer the
following questions: (a) what can the comparison of the original in English and the
Portuguese translation of A Streetcar Named Desire reveal? (b) what consequences
do these differences mean for the interpretation of the original text and its translation?
The results show the impossibility of a literal translation, since several linguistic
characteristics separate the two languages in terms of the specific typology of both
English and Portuguese. This fact obliges the translator to make lexicographic choices,
made possible by the target language, which may imply modifications in the
interpretation of the drama from one language to another / O objetivo desta dissertação de mestrado é a comparação entre o original em língua
inglesa de A Streetcar Named Desire, de Tennessee Williams, com a respectiva
tradução em português, Um bonde chamado Desejo, de Vadim Nikitin, a fim de
caracterizar as diferenças entre as duas versões com base nas escolhas
lexicogramaticais feitas pelos referidos autores. Desde 1950, o importante trabalho de
estudiosos da tradução baseada em linguística tem feito muito para romper as
fronteiras entre diferentes disciplinas dedicadas a ela, e tirar seus estudos de uma
posição de possível confronto. A pesquisa tem o apoio da Linguística Sistêmico-
Funcional (LSF), uma proposta teórico-metodológica de Halliday (1985) e Halliday e
Matthiessen (2004). A LSF estabelece que o uso da língua é funcional; que sua função
é construir significados; que os significados são influenciados pelo contexto social e
cultural em que são intercambiados; e que o processo de uso da língua é um processo
semiótico, um processo de fazer significado por meio de escolhas. Pesquisas mostram
que esse quadro teórico pode ser aplicável ao campo dos estudos da tradução a partir
de vários aspectos envolvidos na LSF: o sistema da transitividade, a modalidade e a
avaliatividade, além da noção de estrutura temática. Outras contribuições ajudam a
entender as características que marcam uma tradução, tais como as noções de
determinismo e relatividade linguísticos, bem como a questão da tipologia linguística.
O presente estudo busca responder às seguintes perguntas: (a) o que a comparação
do original em inglês e a tradução em português de A Streetcar Named Desire pode
revelar? (b) que consequências essas diferenças significam para a interpretação do
texto original e de sua tradução? Os resultados mostram a impossibilidade de uma
tradução literal, já que várias características linguísticas separam as duas línguas em
termos da tipologia específica seja do inglês, seja do português. Esse fato obriga o
tradutor a fazer escolhas lexicogramaticais possibilitadas pela língua alvo o que pode
implicar modificações na interpretação do drama de uma língua a outra
|
73 |
Recherche d’entités nommées complexes sur le web : propositions pour l’extraction et pour le calcul de similarité / Retrieval of Comple Named Entities on the web : proposals for extraction and similarity computationFotsoh Tawaofaing, Armel 27 February 2018 (has links)
Les récents développements des nouvelles technologies de l’information et de la communication font du Web une véritable mine d’information. Cependant, les pages Web sont très peu structurées. Par conséquent, il est difficile pour une machine de les traiter automatiquement pour en extraire des informations pertinentes pour une tâche ciblée. C’est pourquoi les travaux de recherche s’inscrivant dans la thématique de l’Extraction d’Information dans les pages web sont en forte croissance. Aussi, l’interrogation de ces informations, généralement structurées et stockées dans des index pour répondre à des besoins d’information précis correspond à la Recherche d’Information (RI). Notre travail de thèse se situe à la croisée de ces deux thématiques. Notre objectif principal est de concevoir et de mettre en œuvre des stratégies permettant de scruter le web pour extraire des Entités Nommées (EN) complexes (EN composées de plusieurs propriétés pouvant être du texte ou d’autres EN) de type entreprise ou de type événement, par exemple. Nous proposons ensuite des services d’indexation et d’interrogation pour répondre à des besoins d’informations. Ces travaux ont été réalisés au sein de l’équipe T2I du LIUPPA, et font suite à une commande de l’entreprise Cogniteev, dont le cœur de métier est centré sur l’analyse du contenu du Web. Les problématiques visées sont, d’une part, l’extraction d’EN complexes sur le Web et, d’autre part, l’indexation et la recherche d’information intégrant ces EN complexes. Notre première contribution porte sur l’extraction d’EN complexes dans des textes. Pour cette contribution, nous prenons en compte plusieurs problèmes, notamment le contexte bruité caractérisant certaines propriétés (pour un événement par exemple, la page web correspondante peut contenir deux dates : la date de l’événement et celle de mise en vente des billets). Pour ce problème en particulier, nous introduisons un module de détection de blocs qui permet de focaliser l’extraction des propriétés sur des blocs de texte pertinents. Nos expérimentations montrent une nette amélioration des performances due à cette approche. Nous nous sommes également intéressés à l’extraction des adresses, où la principale difficulté découle du fait qu’aucun standard ne se soit réellement imposé comme modèle de référence. Nous proposons donc un modèle étendu et une approche d’extraction basée sur des patrons et des ressources libres.Notre deuxième contribution porte sur le calcul de similarité entre EN complexes. Dans l’état de l’art, ce calcul se fait généralement en deux étapes : (i) une première calcule les similarités entre propriétés et (ii) une deuxième agrège les scores obtenus pour le calcul de la similarité globale. En ce qui concerne cette première étape, nous proposons une fonction de calcul de similarité entre EN spatiale, l’une représentée par un point et l’autre par un polygone. Elle complète l’état de l’art. Notons que nos principales propositions se situent au niveau de la deuxième étape. Ainsi, nous proposons trois techniques pour l’agrégation des scores intermédiaires. Les deux premières sont basées sur la somme pondérée des scores intermédiaires (combinaison linéaire et régression logistique). La troisième exploite les arbres de décisions pour agréger les scores intermédiaires. Enfin, nous proposons une dernière approche basée sur le clustering et le modèle vectoriel de Salton pour le calcul de similarité entre EN complexes. Son originalité vient du fait qu’elle ne nécessite pas de passer par le calcul de scores de similarités intermédiaires. / Recent developments in information technologies have made the web an important data source. However, the web content is very unstructured. Therefore, it is a difficult task to automatically process this web content in order to extract relevant information. This is a reason why research work related to Information Extraction (IE) on the web are growing very quickly. Similarly, another very explored research area is the querying of information extracted on the web to answer an information need. This other research area is known as Information Retrieval (IR). Our research work is at the crossroads of both areas. The main goal of our work is to develop strategies and techniques for crawling the web in order to extract complex Named Entities (NEs) (NEs with several properties that may be text or other NEs). We then propose to index them and to query them in order to answer information needs. This work was carried out within the T2I team of the LIUPPA laboratory, in collaboration with Cogniteev, a company which core business is focused on the analysis of web content. The issues we had to deal with were the extraction of complex NEs on the web and the development of IR services supplied by the extracted data. Our first contribution is related to complex NEs extraction from text content. For this contribution, we take into consideration several problems, in particular the noisy context characterizing some properties (the web page describing an event for example, may contain more than one dates: the event’s date and the date of ticket’s sales opening). For this particular problem, we introduce a block detection module that focuses property's extraction on relevant text blocks. Our experiments show an improvement of system’s performances. We also focused on address extraction where the main issue arises from the fact that there is not a standard way for writing addresses in general and on the web in particular. We therefore propose a pattern-based approach which uses some lexicons for extracting addresses from text, regardless of proprietary resources.Our second contribution deals with similarity computation between complex NEs. In the state of the art, this similarity computation is generally performed in two steps: (i) first, similarities between properties are calculated; (ii) then the obtained similarities are aggregated to compute the overall similarity. Our main proposals focuses on the second step. We propose three techniques for aggregating property’s similarities. The first two are based on the weighted sum of these property’s similarities (simple linear combination and logistic regression). The third technique however, uses decision trees for the aggregation. Finally, we also propose a last approach based on clustering and Salton vector model. This last approach evaluates the similarity at the complex NE level without computing property’s similarities. We also propose a similarity computation function between spatial EN, one represented by a point and the other by a polygon. This completes those of the state of the art.
|
74 |
Rozpoznávání pojmenovaných entit pomocí neuronových sítí / Neural Network Based Named Entity RecognitionStraková, Jana January 2017 (has links)
Title: Neural Network Based Named Entity Recognition Author: Jana Straková Institute: Institute of Formal and Applied Linguistics Supervisor of the doctoral thesis: prof. RNDr. Jan Hajič, Dr., Institute of Formal and Applied Linguistics Abstract: Czech named entity recognition (the task of automatic identification and classification of proper names in text, such as names of people, locations and organizations) has become a well-established field since the publication of the Czech Named Entity Corpus (CNEC). This doctoral thesis presents the author's research of named entity recognition, mainly in the Czech language. It presents work and research carried out during CNEC publication and its evaluation. It fur- ther envelops the author's research results, which improved Czech state-of-the-art results in named entity recognition in recent years, with special focus on artificial neural network based solutions. Starting with a simple feed-forward neural net- work with softmax output layer, with a standard set of classification features for the task, the thesis presents methodology and results, which were later used in open-source software solution for named entity recognition, NameTag. The thesis finalizes with a recurrent neural network based recognizer with word embeddings and character-level word embeddings,...
|
75 |
Knowledge Extraction for Hybrid Question AnsweringUsbeck, Ricardo 18 May 2017 (has links)
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows.
With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data.
Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data.
One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets.
While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases.
Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer.
Consequently, three main research gaps are considered and addressed in this work:
First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data.
The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets.
We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets.
The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases.
To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.
|
76 |
Prerequisites for Extracting Entity Relations from Swedish TextsLenas, Erik January 2020 (has links)
Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework. / Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline.
|
77 |
Encyclopaedic question answeringDornescu, Iustin January 2012 (has links)
Open-domain question answering (QA) is an established NLP task which enables users to search for speciVc pieces of information in large collections of texts. Instead of using keyword-based queries and a standard information retrieval engine, QA systems allow the use of natural language questions and return the exact answer (or a list of plausible answers) with supporting snippets of text. In the past decade, open-domain QA research has been dominated by evaluation fora such as TREC and CLEF, where shallow techniques relying on information redundancy have achieved very good performance. However, this performance is generally limited to simple factoid and deVnition questions because the answer is usually explicitly present in the document collection. Current approaches are much less successful in Vnding implicit answers and are diXcult to adapt to more complex question types which are likely to be posed by users. In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid questions to encyclopaedic questions: list questions composed of several constraints. These questions have more than one correct answer which usually cannot be extracted from one small snippet of text. To correctly interpret the question, systems need to combine classic knowledge-based approaches with advanced NLP techniques. To Vnd and extract answers, systems need to aggregate atomic facts from heterogeneous sources as opposed to simply relying on keyword-based similarity. Encyclopaedic questions promote QA systems which use basic reasoning, making them more robust and easier to extend with new types of constraints and new types of questions. A novel semantic architecture is proposed which represents a paradigm shift in open-domain QA system design, using semantic concepts and knowledge representation instead of words and information retrieval. The architecture consists of two phases, analysis – responsible for interpreting questions and Vnding answers, and feedback – responsible for interacting with the user. This architecture provides the basis for EQUAL, a semantic QA system developed as part of the thesis, which uses Wikipedia as a source of world knowledge and iii employs simple forms of open-domain inference to answer encyclopaedic questions. EQUAL combines the output of a syntactic parser with semantic information from Wikipedia to analyse questions. To address natural language ambiguity, the system builds several formal interpretations containing the constraints speciVed by the user and addresses each interpretation in parallel. To Vnd answers, the system then tests these constraints individually for each candidate answer, considering information from diUerent documents and/or sources. The correctness of an answer is not proved using a logical formalism, instead a conVdence-based measure is employed. This measure reWects the validation of constraints from raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge from Wikipedia and the Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia link graph to Vnd relevant information. This method achieves good precision and allows only pages of a certain type to be considered, but is aUected by the incompleteness of the existing markup targeted towards human readers. In order to address this, a semantic analysis module which disambiguates entities is developed to enrich Wikipedia articles with additional links to other pages. The module increases recall, enabling the system to rely more on the link structure of Wikipedia than on word-based similarity between pages. It also allows authoritative information from diUerent sources to be linked to the encyclopaedia, further enhancing the coverage of the system. The viability of the proposed approach was evaluated in an independent setting by participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL outperformed standard textual QA systems as well as semi-automatic approaches. Having established a feasible way forward for the design of open-domain QA systems, future work will attempt to further improve performance to take advantage of recent advances in information extraction and knowledge representation, as well as by experimenting with formal reasoning and inferencing capabilities.
|
78 |
Modèles graphiques discriminants pour l'étiquetage de séquences : application à la reconnaissance d'entités nommées radiophiniques / Discriminative graphical models for sequence labelling : application to named entity recognition in audio broadcast newsZidouni, Azeddine 08 December 2010 (has links)
Le traitement automatique des données complexes et variées est un processus fondamental dans les applications d'extraction d'information. L'explosion combinatoire dans la composition des textes journalistiques et l'évolution du vocabulaire rend la tâche d'extraction d'indicateurs sémantiques, tel que les entités nommées, plus complexe par les approches symboliques. Les modèles stochastiques structurels tel que les champs conditionnels aléatoires (CRF) permettent d'optimiser des systèmes d'extraction d'information avec une importante capacité de généralisation. La première contribution de cette thèse est consacrée à la définition du contexte optimal pour l'extraction des régularités entre les mots et les annotations dans la tâche de reconnaissance d'entités nommées. Nous allons intégrer diverses informations dans le but d'enrichir les observations et améliorer la qualité de prédiction du système. Dans la deuxième partie nous allons proposer une nouvelle approche d'adaptation d'annotations entre deux protocoles différents. Le principe de cette dernière est basé sur l'enrichissement d'observations par des données générées par d'autres systèmes. Ces travaux seront expérimentés et validés sur les données de la campagne ESTER. D'autre part, nous allons proposer une approche de couplage entre le niveau signal représenté par un indice de la qualité de voisement et le niveau sémantique. L'objectif de cette étude est de trouver le lien entre le degré d'articulation du locuteur et l'importance de son discours / Recent researches in Information Extraction are designed to extract fixed types of information from data. Sequence annotation systems are developed to associate structured annotations to input data presented in sequential form. The named entity recognition (NER) task consists of identifying and classifying every word in a document into some predefined categories such as person name, locations, organizations, and dates. The complexity of the NER is largely related to the definition of the task and to the complexity of the relationships between words and the semantic associated. Our first contribution is devoted to solving the NER problem using discriminative graphical models. The proposed approach investigates the use of various contexts of the words to improve recognition. NER systems are fixed in accordance with a specific annotation protocol. Thus, new applications are developed for new protocols. The challenge is how we can adapt an annotation system which is performed for a specific application to other target application? We will propose in this work an adaptation approach of sequence labelling task based on annotation enrichment using conditional random fields (CRF). Experimental results show that the proposed approach outperform rules-based approach in NER task. Finally, we propose a multimodal approach of NER by integrating low level features as contextual information in radio broadcast news data. The objective of this study is to measure the correlation between the speaker voicing quality and the importance of his speech
|
79 |
An anonymizable entity finder in judicial decisionsKazemi, Farzaneh January 2008 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal.
|
80 |
[en] SECOND LEVEL RECOMMENDATION SYSTEM TO SUPPORT NEWS EDITING / [pt] SISTEMA DE RECOMENDAÇÃO DE SEGUNDO NÍVEL PARA SUPORTE À PRODUÇÃO DE MATÉRIAS JORNALÍSTICASDEMETRIUS COSTA RAPELLO 10 April 2014 (has links)
[pt] Sistemas de recomendação têm sido amplamente utilizados pelos grandes
portais na Web, em decorrência do aumento do volume de dados disponíveis na
Web. Tais sistemas são basicamente utilizados para sugerir informações
relevantes para os seus usuários. Esta dissertação apresenta um sistema de
recomendação de segundo nível para auxiliar equipes de jornalistas de portais de
notícias no processo de recomendação de notícias relacionadas para os usuários do
portal. O sistema é chamado de segundo nível pois apresenta recomendações aos
jornalistas para que, por sua vez, geram recomendações aos usuários do portal. O
modelo seguido pelo sistema consiste na recomendação de notícias relacionadas
com base em características extraídas do próprio texto da notícia original. As
características extraídas permitem a criação de consultas contra um banco de
dados de notícias anteriormente publicadas. O resultado de uma consulta é uma
lista de notícias candidatas à recomendação, ordenada pela similaridade com a
notícia original e pela data de publicação, que o editor da notícia original
manualmente processa para gerar a lista final de notícias relacionadas. / [en] Recommendation systems are widely used by major Web portals due to the
increase in the volume of data available on the Web. Such systems are basically
used to suggest information relevant to their users. This dissertation presents a
second-level recommendation system, which aims at assisting the team of
journalists of a news Web portal in the process of recommending related news for
the users of the Web portal. The system is called second level since it creates
recommendations to the journalists Who, in turn, generate recommendations to
the users. The system follows a model based on features extracted from the text
itself. The extracted features permit creating queries against a news database. The
query result is a list of candidate news, sorted by score and date of publication,
which the news editor manually processes to generate the final list of related
news.
|
Page generated in 0.0404 seconds