Global ETD Search

11	Investigação de métodos de desambiguação lexical de sentidos de verbos do português do Brasil / Research of word sense disambiguation methods for verbs in brazilian portuguese Cabezudo, Marco Antonio Sobrevilla 28 August 2015 (has links) A Desambiguação Lexical de Sentido (DLS) consiste em determinar o sentido mais apropriado da palavra em um contexto determinado, utilizando-se um repositório de sentidos pré-especificado. Esta tarefa é importante para outras aplicações, por exemplo, a tradução automática. Para o inglês, a DLS tem sido amplamente explorada, utilizando diferentes abordagens e técnicas, contudo, esta tarefa ainda é um desafio para os pesquisadores em semântica. Analisando os resultados dos métodos por classes gramaticais, nota-se que todas as classes não apresentam os mesmos resultados, sendo que os verbos são os que apresentam os piores resultados. Estudos ressaltam que os métodos de DLS usam informações superficiais e os verbos precisam de informação mais profunda para sua desambiguação, como frames sintáticos ou restrições seletivas. Para o português, existem poucos trabalhos nesta área e só recentemente tem-se investigado métodos de uso geral. Além disso, salienta-se que, nos últimos anos, têm sido desenvolvidos recursos lexicais focados nos verbos. Nesse contexto, neste trabalho de mestrado, visou-se investigar métodos de DLS de verbos em textos escritos em português do Brasil. Em particular, foram explorados alguns métodos tradicionais da área e, posteriormente, foi incorporado conhecimento linguístico proveniente da Verbnet.Br. Para subsidiar esta investigação, o córpus CSTNews foi anotado com sentidos de verbos usando a WordNet-Pr como repositório de sentidos. Os resultados obtidos mostraram que os métodos de DLS investigados não conseguiram superar o baseline mais forte e que a incorporação de conhecimento da VerbNet.Br produziu melhorias nos métodos, porém, estas melhorias não foram estatisticamente significantes. Algumas contribuições deste trabalho de mestrado foram um córpus anotado com sentidos de verbos, a criação de uma ferramenta que auxilie a anotação de sentidos, a investigação de métodos de DLS e o uso de informações especificas de verbos (provenientes da VerbNet.Br) na DLS de verbos. / Word Sense Disambiguation (WSD) aims at identifying the appropriate sense of a word in a given context, using a pre-specified sense-repository. This task is important to other applications as Machine Translation. For English, WSD has been widely studied, using different approaches and techniques, however, this task is still a challenge for researchers in Semantics. Analyzing the performance of different methods by the morphosyntactic class, note that not all classes have the same results, and the worst results are obtained for Verbs. Studies highlight that WSD methods use shallow information and Verbs need deeper information for its disambiguation, like syntactic frames or selectional restrictions. For Portuguese, there are few works in WSD and, recently, some works for general purpose. In addition, it is noted that, recently, have been developed lexical resources focused on Verbs. In this context, this master work aimed at researching WSD methods for verbs in texts written in Brazilian Portuguese. In particular, traditional WSD methods were explored and, subsequently, linguistic knowledge of VerbNet.Br was incorporated in these methods. To support this research, CSTNews corpus was annotated with verb senses using the WordNet-Pr as a sense-repository. The results showed that explored WSD methods did not outperform the hard baseline and the incorporation of VerbNet.Br knowledge yielded improvements in the methods, however, these improvements were not statistically significant. Some contributions of this work were the sense-annotated corpus, the creation of a tool for support the sense-annotation, the research of WSD methods for verbs and the use of specific information of verbs (from VerbNet.Br) in the WSD of verbs. Computational linguistics Desambiguação lexical de sentindo Linguística computacional Natural language processing Processamento da linguagem natural Word sense disambiguation
12	Investigação de métodos de desambiguação lexical de sentidos de verbos do português do Brasil / Research of word sense disambiguation methods for verbs in brazilian portuguese Marco Antonio Sobrevilla Cabezudo 28 August 2015 (has links) A Desambiguação Lexical de Sentido (DLS) consiste em determinar o sentido mais apropriado da palavra em um contexto determinado, utilizando-se um repositório de sentidos pré-especificado. Esta tarefa é importante para outras aplicações, por exemplo, a tradução automática. Para o inglês, a DLS tem sido amplamente explorada, utilizando diferentes abordagens e técnicas, contudo, esta tarefa ainda é um desafio para os pesquisadores em semântica. Analisando os resultados dos métodos por classes gramaticais, nota-se que todas as classes não apresentam os mesmos resultados, sendo que os verbos são os que apresentam os piores resultados. Estudos ressaltam que os métodos de DLS usam informações superficiais e os verbos precisam de informação mais profunda para sua desambiguação, como frames sintáticos ou restrições seletivas. Para o português, existem poucos trabalhos nesta área e só recentemente tem-se investigado métodos de uso geral. Além disso, salienta-se que, nos últimos anos, têm sido desenvolvidos recursos lexicais focados nos verbos. Nesse contexto, neste trabalho de mestrado, visou-se investigar métodos de DLS de verbos em textos escritos em português do Brasil. Em particular, foram explorados alguns métodos tradicionais da área e, posteriormente, foi incorporado conhecimento linguístico proveniente da Verbnet.Br. Para subsidiar esta investigação, o córpus CSTNews foi anotado com sentidos de verbos usando a WordNet-Pr como repositório de sentidos. Os resultados obtidos mostraram que os métodos de DLS investigados não conseguiram superar o baseline mais forte e que a incorporação de conhecimento da VerbNet.Br produziu melhorias nos métodos, porém, estas melhorias não foram estatisticamente significantes. Algumas contribuições deste trabalho de mestrado foram um córpus anotado com sentidos de verbos, a criação de uma ferramenta que auxilie a anotação de sentidos, a investigação de métodos de DLS e o uso de informações especificas de verbos (provenientes da VerbNet.Br) na DLS de verbos. / Word Sense Disambiguation (WSD) aims at identifying the appropriate sense of a word in a given context, using a pre-specified sense-repository. This task is important to other applications as Machine Translation. For English, WSD has been widely studied, using different approaches and techniques, however, this task is still a challenge for researchers in Semantics. Analyzing the performance of different methods by the morphosyntactic class, note that not all classes have the same results, and the worst results are obtained for Verbs. Studies highlight that WSD methods use shallow information and Verbs need deeper information for its disambiguation, like syntactic frames or selectional restrictions. For Portuguese, there are few works in WSD and, recently, some works for general purpose. In addition, it is noted that, recently, have been developed lexical resources focused on Verbs. In this context, this master work aimed at researching WSD methods for verbs in texts written in Brazilian Portuguese. In particular, traditional WSD methods were explored and, subsequently, linguistic knowledge of VerbNet.Br was incorporated in these methods. To support this research, CSTNews corpus was annotated with verb senses using the WordNet-Pr as a sense-repository. The results showed that explored WSD methods did not outperform the hard baseline and the incorporation of VerbNet.Br knowledge yielded improvements in the methods, however, these improvements were not statistically significant. Some contributions of this work were the sense-annotated corpus, the creation of a tool for support the sense-annotation, the research of WSD methods for verbs and the use of specific information of verbs (from VerbNet.Br) in the WSD of verbs. Desambiguação lexical de sentindo Linguística computacional Processamento da linguagem natural Computational linguistics Natural language processing Word sense disambiguation
13	NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS Wang, Qianqian 01 September 2017 (has links) Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program that generates “plausible” multiple-choice selections by analyzing word sense disambiguation and calculating semantic similarity between two natural language entities. The core is Word Sense Disambiguation (WSD), WSD is identifying which sense of a word is used in a sentence when the word has multiple meanings. WSD is considered as an AI-hard problem. The project presents several algorithms to resolve WSD problem and compute semantic similarity, along with experimental results demonstrating their effectiveness. Natural Language Processing Semantic Similarity Word Sense Disambiguation Other Computer Engineering
14	Using web texts for word sense disambiguation Wang, Yuanyong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) In all natural languages, ambiguity is a universal phenomenon. When a word has multiple meaning depending on its contexts it is called an ambiguous word. The process of determining the correct meaning of a word (formally named word sense) in a given context is word sense disambiguation(WSD). WSD is one of the most fundamental problems in natural language processing. If properly addressed, it could lead to revolutionary advancement in many other technologies such as text search engine technology, automatic text summarization and classification, automatic lexicon construction, machine translation and automatic learning agent technology. One difficulty that has always confronted WSD researchers is the lack of high quality sense specific information. For example, if the word "power" Immediately preceds the word "plant", it would strongly constrain the meaning of "plant" to be "an industrial facility". If "power" is replaced by the phrase "root of a", then the sense of "plant" is dictated to be "an organism" of the kingdom Planate. It is obvious that manually building a comprehensive sense specific information base for each sense of each word is impractical. Researchers also tried to extract such information from large dictionaries as well as manually sense tagged corpora. Most of the dictionaries used for WSD are not built for this purpose and have a lot of inherited peculiarities. While manual tagging is slow and costly, automatic tagging is not successful in providing a reliable performance. Furthermore, it is often the case that for a randomly chosen word (to be disambiguated), the sense specific context corpora that can be collected from dictionaries are not large enough. Therefore, manually building sense specific information bases or extraction of such information from dictionaries are not effective approaches to obtain sense specific information. A web text, due to its vast quantity and wide diversity, becomes an ideal source for extraction of large quantity of sense specific information. In this thesis, the impacts of Web texts on various aspects of WSD has been investigated. New measures and models are proposed to tame enormous amount of Web texts for the purpose of WSD. They are formally evaluated by experimenting their disambiguation performance on about 70 ambiguous nouns. The results are very encouraging and have helped revealing the great potential of using Web texts for WSD. The results are published in three papers at Australia national and international level (Wang&Hoffmann,2004,2005,2006)[42][43][44]. Search engine. Word sense disambiguation. Ambiguity. Semantics -- Data processing. Computational linguistics.
15	An Investigation of Word Sense Disambiguation for Improving Lexical Chaining Enss, Matthew January 2006 (has links) This thesis investigates how word sense disambiguation affects lexical chains, as well as proposing an improved model for lexical chaining in which word sense disambiguation is performed prior to lexical chaining. A lexical chain is a set of words from a document that are related in meaning. Lexical chains can be used to identify the dominant topics in a document, as well as where changes in topic occur. This makes them useful for applications such as topic segmentation and document summarization. <br /><br /> However, polysemous words are an inherent problem for algorithms that find lexical chains as the intended meaning of a polysemous word must be determined before its semantic relations to other words can be determined. For example, the word "bank" should only be placed in a chain with "money" if in the context of the document "bank" refers to a place that deals with money, rather than a river bank. The process by which the intended senses of polysemous words are determined is word sense disambiguation. To date, lexical chaining algorithms have performed word sense disambiguation as part of the overall process building lexical chains. Because the intended senses of polysemous words must be determined before words can be properly chained, we propose that word sense disambiguation should be performed before lexical chaining occurs. Furthermore, if word sense disambiguation is performed prior to lexical chaining, then it can be done with any available disambiguation method, without regard to how lexical chains will be built afterwards. Therefore, the most accurate available method for word sense disambiguation should be applied prior to the creation of lexical chains. <br /><br /> We perform an experiment to demonstrate the validity of the proposed model. We compare the lexical chains produced in two cases: <ol> <li>Lexical chaining is performed as normal on a corpus of documents that has not been disambiguated. </li> <li>Lexical chaining is performed on the same corpus, but all the words have been correctly disambiguated beforehand. </li></ol> We show that the lexical chains created in the second case are more correct than the chains created in the first. This result demonstrates that accurate word sense disambiguation performed prior to the creation of lexical chains does lead to better lexical chains being produced, confirming that our model for lexical chaining is an improvement upon previous approaches. Computer Science lexical chains word sense disambiguation computational linguistics natural language processing
16	An Investigation of Word Sense Disambiguation for Improving Lexical Chaining Enss, Matthew January 2006 (has links) This thesis investigates how word sense disambiguation affects lexical chains, as well as proposing an improved model for lexical chaining in which word sense disambiguation is performed prior to lexical chaining. A lexical chain is a set of words from a document that are related in meaning. Lexical chains can be used to identify the dominant topics in a document, as well as where changes in topic occur. This makes them useful for applications such as topic segmentation and document summarization. <br /><br /> However, polysemous words are an inherent problem for algorithms that find lexical chains as the intended meaning of a polysemous word must be determined before its semantic relations to other words can be determined. For example, the word "bank" should only be placed in a chain with "money" if in the context of the document "bank" refers to a place that deals with money, rather than a river bank. The process by which the intended senses of polysemous words are determined is word sense disambiguation. To date, lexical chaining algorithms have performed word sense disambiguation as part of the overall process building lexical chains. Because the intended senses of polysemous words must be determined before words can be properly chained, we propose that word sense disambiguation should be performed before lexical chaining occurs. Furthermore, if word sense disambiguation is performed prior to lexical chaining, then it can be done with any available disambiguation method, without regard to how lexical chains will be built afterwards. Therefore, the most accurate available method for word sense disambiguation should be applied prior to the creation of lexical chains. <br /><br /> We perform an experiment to demonstrate the validity of the proposed model. We compare the lexical chains produced in two cases: <ol> <li>Lexical chaining is performed as normal on a corpus of documents that has not been disambiguated. </li> <li>Lexical chaining is performed on the same corpus, but all the words have been correctly disambiguated beforehand. </li></ol> We show that the lexical chains created in the second case are more correct than the chains created in the first. This result demonstrates that accurate word sense disambiguation performed prior to the creation of lexical chains does lead to better lexical chains being produced, confirming that our model for lexical chaining is an improvement upon previous approaches. Computer Science lexical chains word sense disambiguation computational linguistics natural language processing
17	Cross-lingual Information Retrieval On Turkish And English Texts Boynuegri, Akif 01 April 2010 (has links) (PDF) In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation service during translation of queries. In sense indexing based approach, documents are indexed according to meanings of words instead of words themselves. Retrieval of documents is performed according to meanings of the query words as well. During the identification of intended meaning of query terms, manual and automatic word sense disambiguation methods are used and compared to each other. Knowledge based WSD methods that use different gloss enrichment techniques are compared for Turkish words. Turkish WordNet is used as a primary knowledge base and English WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of words are more clearly identified by using semantic relations defined in WordNets and Turkish Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity metric is used as an alternative metric to word overlap count. Effects of using cosine similarity metric are observed for each WSD methods that use different knowledge bases.
18	Word meaning in context as a paraphrase distribution : evidence, learning, and inference Moon, Taesun, Ph. D. 25 October 2011 (has links) In this dissertation, we introduce a graph-based model of instance-based, usage meaning that is cast as a problem of probabilistic inference. The main aim of this model is to provide a flexible platform that can be used to explore multiple hypotheses about usage meaning computation. Our model takes up and extends the proposals of Erk and Pado [2007] and McCarthy and Navigli [2009] by representing usage meaning as a probability distribution over potential paraphrases. We use undirected graphical models to infer this probability distribution for every content word in a given sentence. Graphical models represent complex probability distributions through a graph. In the graph, nodes stand for random variables, and edges stand for direct probabilistic interactions between them. The lack of edges between any two variables reflect independence assumptions. In our model, we represent each content word of the sentence through two adjacent nodes: the observed node represents the surface form of the word itself, and the hidden node represents its usage meaning. The distribution over values that we infer for the hidden node is a paraphrase distribution for the observed word. To encode the fact that lexical semantic information is exchanged between syntactic neighbors, the graph contains edges that mirror the dependency graph for the sentence. Further knowledge sources that influence the hidden nodes are represented through additional edges that, for example, connect to document topic. The integration of adjacent knowledge sources is accomplished in a standard way by multiplying factors and marginalizing over variables. Evaluating on a paraphrasing task, we find that our model outperforms the current state-of-the-art usage vector model [Thater et al., 2010] on all parts of speech except verbs, where the previous model wins by a small margin. But our main focus is not on the numbers but on the fact that our model is flexible enough to encode different hypotheses about usage meaning computation. In particular, we concentrate on five questions (with minor variants): - Nonlocal syntactic context: Existing usage vector models only use a word's direct syntactic neighbors for disambiguation or inferring some other meaning representation. Would it help to have contextual information instead "flow" along the entire dependency graph, each word's inferred meaning relying on the paraphrase distribution of its neighbors? - Influence of collocational information: In some cases, it is intuitively plausible to use the selectional preference of a neighboring word towards the target to determine its meaning in context. How does modeling selectional preferences into the model affect performance? - Non-syntactic bag-of-words context: To what extent can non-syntactic information in the form of bag-of-words context help in inferring meaning? - Effects of parametrization: We experiment with two transformations of MLE. One interpolates various MLEs and another transforms it by exponentiating pointwise mutual information. Which performs better? - Type of hidden nodes: Our model posits a tier of hidden nodes immediately adjacent the surface tier of observed words to capture dynamic usage meaning. We examine the model based on by varying the hidden nodes such that in one the nodes have actual words as values and in the other the nodes have nameless indexes as values. The former has the benefit of interpretability while the latter allows more standard parameter estimation. Portions of this dissertation are derived from joint work between the author and Katrin Erk [submitted]. / text Computational linguistics Lexical semantics Probabilistic graphical models Natural language processing Word sense disambiguation Paraphrasing
19	Using web texts for word sense disambiguation Wang, Yuanyong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) In all natural languages, ambiguity is a universal phenomenon. When a word has multiple meaning depending on its contexts it is called an ambiguous word. The process of determining the correct meaning of a word (formally named word sense) in a given context is word sense disambiguation(WSD). WSD is one of the most fundamental problems in natural language processing. If properly addressed, it could lead to revolutionary advancement in many other technologies such as text search engine technology, automatic text summarization and classification, automatic lexicon construction, machine translation and automatic learning agent technology. One difficulty that has always confronted WSD researchers is the lack of high quality sense specific information. For example, if the word "power" Immediately preceds the word "plant", it would strongly constrain the meaning of "plant" to be "an industrial facility". If "power" is replaced by the phrase "root of a", then the sense of "plant" is dictated to be "an organism" of the kingdom Planate. It is obvious that manually building a comprehensive sense specific information base for each sense of each word is impractical. Researchers also tried to extract such information from large dictionaries as well as manually sense tagged corpora. Most of the dictionaries used for WSD are not built for this purpose and have a lot of inherited peculiarities. While manual tagging is slow and costly, automatic tagging is not successful in providing a reliable performance. Furthermore, it is often the case that for a randomly chosen word (to be disambiguated), the sense specific context corpora that can be collected from dictionaries are not large enough. Therefore, manually building sense specific information bases or extraction of such information from dictionaries are not effective approaches to obtain sense specific information. A web text, due to its vast quantity and wide diversity, becomes an ideal source for extraction of large quantity of sense specific information. In this thesis, the impacts of Web texts on various aspects of WSD has been investigated. New measures and models are proposed to tame enormous amount of Web texts for the purpose of WSD. They are formally evaluated by experimenting their disambiguation performance on about 70 ambiguous nouns. The results are very encouraging and have helped revealing the great potential of using Web texts for WSD. The results are published in three papers at Australia national and international level (Wang&Hoffmann,2004,2005,2006)[42][43][44]. Search engine. Word sense disambiguation. Ambiguity. Semantics -- Data processing. Computational linguistics.
20	Interopérabilité Sémantique Multi-lingue des Ressources Lexicales en Données Liées Ouvertes / Semantic Interoperability of Multilingual Lexical Resources in Lexical Linked Data Tchechmedjiev, Andon 14 October 2016 (has links) Lorsqu’il s’agit la construction de ressources lexico-sémantiques multilingues, la première chose qui vient à l’esprit, et la nécessité que les ressources à alignées partagent le même format de données et la même représentations (interopérabilité représentationnelle). Avec l’apparition de standard tels que LMF et leur adaptation au web sémantique pour la production de ressources lexico- sémantiques multilingues en tant que données lexicales liées ouvertes (Ontolex), l’interopérabilité représentationnelle n’est plus un verrou majeur. Cependant, en ce qui concerne l’interopérabilité des alignements multilingues, le choix et la construction du pivot interlingue est l’un des obstacles principaux. Pour nombre de ressources (par ex. BabelNet, EuroWordNet), le choix est fait d’utiliser l’Anglais, ou une autre langue comme pivot interlingue. Ce choix mène à une perte de contraste dans les cas où des sens du Pivot ont des lexicalisations différentes dans la même acception dans plusieurs autres langues. L’utilisation d’une pivot à acceptions interlingues, solution proposée il y a déjà plus de 20 ans, pourrait être viable. Néanmoins, leur construction manuelle est trop ardue du fait du manque d’experts parlant assez de langues et leur construction automatique pose problème du fait de l’absence d’une formalisation et d’une caractérisation axiomatique permettant de garantir leur propriétés. Nous proposons dans cette thèse de d’abord formaliser l’architecture à pivot interlingue par acceptions, en développant une axiomatisation garantissant leurs propriétés. Nous proposons ensuite des algorithmes de construction initiale automatique en utilisant les propriétés combinatoires du graphe des alignements bilingues, mais aussi des algorithmes de mise à jour garantissant l’interopérabilité dynamique. Dans un deuxième temps, nous étudions de manière plus pratique sur DBNary, un extraction périodique de Wiktionary dans de nombreuses éditions de langues, afin de cerner les contraintes pratiques à l’application des algorithmes proposés. / When it comes to the construction of multilingual lexico-semantic resources, the first thing that comes to mind is that the resources we want to align, should share the same data model and format (representational interoperability). However, with the emergence of standards such as LMF and their implementation and widespread use for the production of resources as lexical linked data (Ontolex), representational interoperability has ceased to be a major challenge for the production of large-scale multilingual resources. However, as far as the interoperability of sense-level multi-lingual alignments is concerned, a major challenge is the choice of a suitable interlingual pivot. Many resources make the choice of using English senses as the pivot (e.g. BabelNet, EuroWordNet), although this choice leads to a loss of contrast between English senses that are lexicalized with a different words in other languages. The use of acception-based interlingual representations, a solution proposed over 20 years ago, could be viable. However, the manual construction of such language-independent pivot representations is very difficult due to the lack of expert speaking enough languages fluently and algorithms for their automatic constructions have never since materialized, mainly because of the lack of a formal axiomatic characterization that ensures the pre- servation of their correctness properties. In this thesis, we address this issue by first formalizing acception-based interlingual pivot architectures through a set of axiomatic constraints and rules that guarantee their correctness. Then, we propose algorithms for the initial construction and the update (dynamic interoperability) of interlingual acception-based multilingual resources by exploiting the combinatorial properties of pairwise bilingual translation graphs. Secondly, we study the practical considerations of applying our construction algorithms on a tangible resource, DBNary, a resource periodically extracted from Wiktionary in many languages in lexical linked data. Désambigïsation lexicale multilingue Interopérabilité Ressources langagières Multilingual Word Sense Disambiguation Interoperability Multilingual Lexical Resources 004

Search results