Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
501 |
Um sistema hibrido simbolico-conexionista para o processamento de papeis tematicosRosa, João Luis Garcia 24 July 2018 (has links)
Orientadores: Edson Françozo, Marcio Luiz de Andrade Netto / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-07-24T23:13:19Z (GMT). No. of bitstreams: 1
Rosa_JoaoLuisGarcia_D.pdf: 23647013 bytes, checksum: 69242fa79872f85fd23c8faea407a338 (MD5)
Previous issue date: 1999 / Resumo: Em Lingüística, as relações semânticas entre palavras em uma sentença são consideradas, entre outras coisas, através da atribuição de papéis temáticos, por exemplo, AGENTE, INSTRUMENTO etc. Como na lógica de predicados, expressões lingüísticas simples são decompostas em um predicado (freqüentemente o verbo) e seus argumentos. O predicado atribui papéis temáticos aos argumentos, tal que cada sentença tem uma grade temática, uma estrutura com todos os papéis temáticos atribuídos pelo predicado. Com a finalidade de revelar a grade temática de uma sentença semanticamente bem formada, um sistema chamado HTRP (Hybrid Thematic Role Processor - Processador de Papéis Temáticos Híbrido) é proposto, no qual a arquitetura conexionista tem, como entrada, uma representação distribuída das palavras de uma sentença, e como saída, sua grade temática. Duas versões do sistema são propostas: uma versão com pesos de conexão iniciais aleatórios - RIW (random initial weight version) e uma versão com pesos de conexão iniciais polarizados - BIW (biased initial weight version) para considerar sistemas sem e com conhecimento inicial, respectivamente.Na BIW, os pesos de conexão iniciais refletem regras simbólicas para os papéis temáticos. Para ambas as versões, depois do treinamento supervisionado, um conjunto de regras simbólicas finais é extraído, que é consistentemente correlacionado com o conhecimento lingüístico - simbólico. No caso da BIW, isto corresponde a uma revisão das regras iniciais. Na RIW as regras simbólicas parecem ser induzidas da arquitetura conexionista e do treinamento. O sistema HTRP aprende a reconhecer a grade temática correta para sentenças semanticamente bem formadas do português. Além disso, este sistema possibilita considerações a respeito dos aspectos cognitivos do processamento lingüístico, através das regras simbólicas introduzidas (na BIW) e extraídas (de ambas as versões) / Abstract: In Linguistics, the semantic relations between words in a sentence are accounted for, inter alia, as the assignment of thematic roles, e.g. AGENT, INSTRUMENT, etc. As in predicate logic, simple linguistic expressions are decomposed into one predicate (often the verb) and its arguments. The predicate assigns thematic roles to the arguments, so that each sentence has a thematic grid, a strocture with all thematic roles assigned by the predicate. In order to reveal the thematic grid of a semantically sound sentence, a system called HTRP (Hybrid Thematic Role Processor) is proposed, in which the connectionist architecture has, as input, a distributed representation of the words of a sentence, and, as output, its thematic grid. Both a random initial weight version (RIW) and a biased initial weight version (BIW) are proposed to account for systems without and with initial knowledge, respectively. In BIW, initial connection weights reflect symbolic roles for thematic roles. For both versions, after supervised training, a set of final symbolic roles is extracted, which is consistently correlated to linguistic - symbolic - knowledge. In the case of BIW, this amounts to a revision of the initial roles. In RIW, symbolic roles seem to be induced from the connectionist architecture and training. HTRP system leams how to recognize the correct thematic grid for semantically well-formed Portuguese sentences. Besides this, it leads us to take into account cognitive aspects of the linguistic processing, through the introduced (in RIW) and extracted (from both versions) symbolic roles / Doutorado / Doutor em Linguística
|
502 |
Natural Language Processing from a Software Engineering PerspectiveÅkerud, Daniel, Rendlo, Henrik January 2004 (has links)
This thesis is intended to deal with questions related to the processing of naturally occurring texts, also known as natural language processing (NLP). The subject will be approached from a software engineering perspective, and the problem description will be formulated thereafter. The thesis is roughly divided into two major parts. The first part contains a literature study covering fundamental concepts and algorithms. We discuss both serial and parallel architectures, and conclude that different scenarios call for different architectures. The second part is an empirical evaluation of an NLP framework or toolkit chosen amongst a few, conducted in order to elucidate the theoretical part of the thesis. We argue that component based development in a portable language could increase the reusability in the NLP community, where reuse is currently low. The recent emergence of the discovered initiatives and the great potential of many applications in this area reveal a bright future for NLP.
|
503 |
Using cloud services and machine learning to improve customer support : Study the applicability of the method on voice dataSpens, Henrik, Lindgren, Johan January 2018 (has links)
This project investigated how machine learning could be used to classify voice calls in a customer support setting. A set of a few hundred labeled voice calls were recorded and used as data. The calls were transcribed to text using a speech-to-text cloud service. This text was then normalized and used to train models able to predict new voice calls. Different algorithms were used to build the models, including support vector machines and neural networks. The optimal model, found by extensive parameter search, was found to be a support vector machine. Using this optimal model a program that can classify live voice calls was made.
|
504 |
Using the IBM WatsonTM Dialog Service for Assisting Parallel ProgrammingCalvo, Adrián January 2016 (has links)
IBM Watson is on the verge of becoming a milestone in computer science as it is using a new technology that relies on cognitive systems. IBM Watson is able to understand questions in natural language and give proper answers. The use of cognitive computing in parallel programming is an open research issue. Therefore, the objective of this project is to investigate how IBM Watson can help in parallel programming by using the Dialog Service. In order to answer our research question an application has been built based on the IBM Watson Dialog Service and a survey has been carried out. The results of our research demonstrate that the developed application offers valuable answers to the questions asked by a programmer and the survey reveals that students would be interested in using it.
|
505 |
Fast recursive biomedical event extraction / Extraction rapide et récursive des événements biomédicauxLiu, Xiao 25 September 2014 (has links)
L’internet et les nouvelles formes de média de communication, d’information, et de divertissement ont entraîné une croissance massive de la quantité des données numériques. Le traitement et l’interprétation automatique de ces données permettent de créer des bases de connaissances, de rendre les recherches plus efficaces et d’effectuer des recherches sur les médias sociaux. Les travaux de recherche sur le traitement automatique du langage naturel concernent la conception et le développement d’algorithmes, qui permettent aux ordinateurs de traiter automatiquement le langage naturel dans les textes, les contenus audio, les images ou les vidéos, pour des tâches spécifiques. De par la complexité du langage humain, le traitement du langage naturel sous forme textuelle peut être divisé en 4 niveaux : la morphologie, la syntaxe, la sémantique et la pragmatique. Les technologies actuelles du traitement du langage naturel ont eu de grands succès sur les tâches liées auxdeux premiers niveaux, ce qui a permis la commercialisation de beaucoup d’applications comme les moteurs de recherche. Cependant, les moteurs de recherches avancés (structurels) nécessitent une interprétation du langage plus avancée. L’extraction d’information consiste à extraire des informations structurelles à partir des ressources non annotées ou semi-annotées, afin de permettre des recherches avancées et la création automatique des bases de connaissances. Cette thèse étudie le problème d’extraction d’information dans le domaine spécifique de l’extraction des événements biomédicaux. Nous proposons une solution efficace, qui fait un compromis entre deux types principaux de méthodes proposées dans la littérature. Cette solution arrive à un bon équilibre entre la performance et la rapidité, ce qui la rend utilisable pour traiter des données à grande échelle. Elle a des performances compétitives face aux meilleurs modèles existant avec une complexité en temps de calcul beaucoup plus faible. Lors la conception de ce modèle, nous étudions également les effets des différents classifieurs qui sont souvent proposés pour la résolution des problèmes de classification multi-classe. Nous testons également deux méthodes permettant d’intégrer des représentations vectorielles des mots appris par apprentissage profond (deep learning). Même si les classifieurs différents et l’intégration des vecteurs de mots n’améliorent pas grandement la performance, nous pensons que ces directions de recherche ont du potentiel et sont prometteuses pour améliorer l’extraction d’information. / Internet as well as all the modern media of communication, information and entertainment entails a massive increase of digital data quantities. Automatically processing and understanding these massive data enables creating large knowledge bases, more efficient search, social medial research, etc. Natural language processing research concerns the design and development of algorithms that allow computers to process natural language in texts, audios, images or videos automatically for specific tasks. Due to the complexity of human language, natural language processing of text can be divided into four levels: morphology, syntax, semantics and pragmatics. Current natural language processing technologies have achieved great successes in the tasks of the first two levels, leading to successes in many commercial applications such as search. However, advanced structured search engine would require computers to understand language deeper than at the morphology and syntactic levels. Information extraction is designed to extract meaningful structural information from unannotated or semi-annotated resources to enable advanced search and automatically create knowledge bases for further use. This thesis studies the problem of information extraction in the specific domain of biomedical event extraction. We propose an efficient solution, which is a trade-off between the two main trends of methods proposed in previous work. This solution reaches a good balance point between performance and speed, which is suitable to process large scale data. It achieves competitive performance to the best models with a much lower computational complexity. While designing this model, we also studied the effects of different classifiers that are usually proposed to solve the multi-class classification problem. We also tested two simple methods to integrate word vector representations learned by deep learning method into our model. Even if different classifiers and the integration of word vectors do not greatly improve the performance, we believe that these research directions carry some promising potential for improving information extraction.
|
506 |
DBpedia Type and Entity Detection Using Word Embeddings and N-gram ModelsZhou, Hanqing January 2018 (has links)
Nowadays, knowledge bases are used more and more in Semantic Web tasks, such as knowledge acquisition (Hellmann et al., 2013), disambiguation (Garcia et al., 2009) and named entity corpus construction (Hahm et al., 2014), to name a few. DBpedia is playing a central role on the linked open data cloud; therefore, the quality of this knowledge base is becoming a central point of focus. However, there are some issues with the quality of DBpedia. In particular, DBpedia suffers from three major types of problems: a) invalid types for entities, b) missing types for entities, and c) invalid entities in the resources’ description. In order to enhance the quality of DBpedia, it is important to detect these invalid types and resources, as well as complete missing types.
The three main goals of this thesis are: a) invalid entity type detection in order to solve the problem of invalid DBpedia types for entities, b) automatic detection of the types of entities in order to solve the problem of missing DBpedia types for entities, and c) invalid entity detection in order to solve the problem of invalid entities in the resource description of a DBpedia entity. We compare several methods for the detection of invalid types, automatic typing of entities, and invalid entities detection in the resource descriptions. In particular, we compare different classification and clustering algorithms based on various sets of features: entity embedding features (Skip-gram and CBOW models) and traditional n-gram features. We present evaluation results for 358 DBpedia classes extracted from the DBpedia ontology.
The main contribution of this work consists of the development of automatic invalid type detection, automatic entity typing, and automatic invalid entity detection methods using clustering and classification. Our results show that entity embedding models usually perform better than n-gram models, especially the Skip-gram embedding model.
|
507 |
Hypergraphes multimédias dirigés navigables, construction et exploitation / Navigable directed multimedia hypergraphs, construction and exploitationBois, Rémi 21 December 2017 (has links)
Cette thèse en informatique s'intéresse à la structuration et à l'exploration de collections journalistiques. Elle fait appel à plusieurs domaines de recherche : sciences sociales, à travers l'étude de la production journalistique ; ergonomie ; traitement des langues et la recherche d'information ; multimédia et notamment la recherche d'information multimédia. Une branche de la recherche d'information multimédia, appelée hyperliage, constitue la base sur laquelle cette thèse est construite. L'hyperliage consiste à construire automatiquement des liens entre documents multimédias. Nous étendons ce concept en l'appliquant à l'entièreté d'une collection afin d'obtenir un hypergraphe, et nous intéressons notamment à ses caractéristiques topologiques et à leurs conséquences sur l'explorabilité de la structure construite. Nous proposons dans cette thèse des améliorations de l'état de l'art selon trois axes principaux : une structuration de collections d'actualités à l'aide de graphes mutli-sources et multimodaux fondée sur la création de liens inter-documents, son association à une diversité importante des liens permettant de représenter la grande variété des intérêts que peuvent avoir différents utilisateurs, et enfin l'ajout d'un typage des liens créés permettant d'expliciter la relation existant entre deux documents. Ces différents apports sont renforcés par des études utilisateurs démontrant leurs intérêts respectifs. / This thesis studies the structuring and exploration of news collections. While its main focus is on natural language processing and multimedia retrieval, it also deals with social studies through the study of the production of news and ergonomy through the conduct of user tests. The task of hyperlinking, which was recently put forward by the multimedia retrieval community, is at the center of this thesis. Hyperlinking consists in automatically finding relevant links between multimedia segments. We apply this concept to whole news collections, resulting in the creation of a hypergraph, and study the topological properties and their influence on the explorability of the resulting structure. In this thesis, we provide improvements beyond the state of the art along three main {axes:} a structuring of news collections by means of mutli-sources and multimodal graphs based on the creation of inter-document links, its association with a large diversity of links allowing to represent the variety of interests that different users may have, and a typing of the created links in order to make the nature of the relation between two documents explicit. Extensive user studies confirm the interest of the methods developped in this thesis.
|
508 |
Metody pro rozdělování slovních složenin / Splitting word compoundsOberländer, Jonathan January 2017 (has links)
Unlike the English language, languages such as German, Dutch, the Skandinavian languages or Greek form compounds not as multi-word expressions, but by combining the parts of the compound into a new word without any orthographical separation. This poses problems for a variety of tasks, such as Statistical Machine Translation or Information Retrieval. Most previous work on the subject of splitting compounds into their parts, or ``decompounding'' has focused on German. In this work, we create a new, simple, unsupervised system for automatic decompounding for three representative compounding languages: German, Swedish, and Hungarian. A multi-lingual evaluation corpus in the medical domain is created from the EMEA corpus, and annotated with regards to compounding. Finally, several variants of our system are evaluated and compared to previous work. Powered by TCPDF (www.tcpdf.org)
|
509 |
Metody pro rozdělování slovních složenin / Splitting word compoundsOberländer, Jonathan January 2017 (has links)
Unlike the English language, languages such as German, Dutch, the Skandinavian languages or Greek form compounds not as multi-word expressions, but by combining the parts of the compound into a new word without any orthographical separation. This poses problems for a variety of tasks, such as Statistical Machine Translation or Information Retrieval. Most previous work on the subject of splitting compounds into their parts, or ``decompounding'' has focused on German. In this work, we create a new, simple, unsupervised system for automatic decompounding for three representative compounding languages: German, Swedish, and Hungarian. A multi-lingual evaluation corpus in the medical domain is created from the EMEA corpus, and annotated with regards to compounding. Finally, several variants of our system are evaluated and compared to previous work. Powered by TCPDF (www.tcpdf.org)
|
510 |
Semantic analysis for extracting fine-grained opinion aspectsZhan, Tianjie 01 January 2010 (has links)
No description available.
|
Page generated in 0.1087 seconds