Global ETD Search

1	Statistical lexical disambiguation Foster, George F. January 1991 (has links) Note: Disambiguation
2	The Architecture of Result Relations : Corpus and experimental approaches to Result coherence relations in English Andersson, Marta January 2016 (has links) Two fundamental components of causality are the Cause and the Result. In linguistic work the distinction between these aspects is commonly blurred, presumably because the primary research focus has been on describing how language encodes causality. The semantic nature of the component events and the constraints on their relationship are seldom discussed; however, the current work aims to shed light on a broader spectrum of features that underlie the concept. This is an essential foundation for understanding how language communicates Result. The present discussion explores and illuminates the nature of this concept focusing on a relatively open-ended set of linguistic elements that can play a role in shaping a discourse relation in addition to discourse connectives. This is in contrast to the majority of the previous research, which has been quite intensely concerned with investigating a limited collection of well-established causality markers. Also, despite the fact that English has been used in studies on causality both as a control language and a metalanguage, there is surprisingly little work on the semantics of the relations that occur specifically in English, let alone Result relations. By borrowing from several cognitively-oriented approaches and combining empirical data from two written corpora (British National Corpus and the Penn Discourse Treebank) with experimental work, the current study systematically investigates the conceptual and linguistic properties of several closely related Result relation types (including Purpose), along with the joint role of discourse connectives and other discourse elements in conveying the intended sense. The findings indicate that linguistic signals of the conceptual structure of the relation seem to play a more significant role in the interpretation than explicit marking. Two factors emerged as more vital cues than the presence of the ambiguous connective so. In Purpose relations, a modal auxiliary conveying an intended effect, and in Result relations the presence/absence of an intentionally acting actor are crucial for disambiguation. The multifunctional connective therefore seems to merely satisfy the mandatory marking requirement related to the intrinsically unrealized (‘nonveridical’) nature of Purpose. In Result the presence of an ambiguous marker is to a great extent optional in English. However, discourse markers can also reflect how language users categorize causal event types. This claim has been confirmed in several cross-linguistic analyses, but the lexicon of English connectives has not been systematically investigated from this vantage point. The few existing studies found that the uses of English connectives are quite unconstrained across causal categories. The present work contributes to this line of research and suggests that two unambiguous markers, as a result and for this reason, indeed cover a wide range of causal event types; however, they also exhibit significant tendencies to occur prototypically in certain relation types. The presence and role of an intentionally acting discourse participant behind both real-world and linguistic causally-related events contributes to these tendencies. The contexts that include such a participant are regarded as intrinsically subjective and have been found to manifest surface expressions of subjectivity in previous work on other languages. The current study confirms similar tendencies in the linguistic construal and marking of Result relations in English, which proves that certain language elements partake in establishing the intended interpretation on a par with discourse connectives. What emerges as a result of this discussion, is therefore an account on how English utilizes the broad category of Result and what linguistic elements are used to convey the array of resultative events. RESULT PURPOSE discourse connectives disambiguation subjectivity nonveridicality
3	Segmentation and Alignment of Speech and Sketching in a Design Environment Adler, Aaron D. 01 February 2003 (has links) Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given. AI sketch design multimodal disambiguation segmentation alignment
4	DAIRSACC - Do Acronyms Influence Reading Speed and Content Comprehension? Tibor Beres 10 November 2007 (has links) Acronyms, initialisms and other types of abbreviations are frequently used in scientific, academic, governmental and administrative setting to shorten lengthy terminology and nomenclature. While they can make a text easier to read for people familiar with the abbreviations, they can add to the text’s inherent difficulty and impede comprehension for those who are not familiar with their meaning. The phenomenon of acronym polynymy (multiple definitions associated with the same acronym) can create confusion and add to the cognitive load associated with understanding the text. The current practice of defining acronyms only once, when introduced can result in readers scrolling back and forth in the text looking for acronym definitions, increasing the cognitive load and negatively affect reading speed and content comprehension. The purpose of this research was to study if the presence of a large number of acronyms in a text impedes reading performance. The current study also investigated if providing easy access to acronym definitions via hover text would alleviate comprehension problems caused by unknown acronyms in the text. The hypothesis was that by enabling fast acronym disambiguation, and eliminating the need to scroll for acronym definitions, the hover functionality would enhance reading speed and content comprehension. The results of the experiment are analyzed and recommendations for future investigations of the acronym problem are formulated.
5	Prepositional Phrase Attachment Disambiguation Using WordNet Spitzer, Claus January 2006 (has links) In this thesis we use a knowledge-based approach to disambiguating prepositional phrase attachments in English sentences. This method was first introduced by S. M. Harabagiu. The Penn Treebank corpus is used as the training text. We extract 4-tuples of the form <em>VP</em>, <em>NP</em><sub>1</sub>, Prep, <em>NP</em><sub>2</sub> and sort them into classes according to the semantic relationships between parts of each tuple. These relationships are extracted from WordNet. Classes are sorted into different tiers based on the strictness of their semantic relationship. Disambiguation of prepositional phrase attachments can be cast as a constraint satisfaction problem, where the tiers of extracted classes act as the constraints. Satisfaction is achieved when the strictest possible tier unanimously indicates one kind of attachment. The most challenging kind of problems for disambiguation of prepositional phrases are ones where the prepositional phrase may attach to either the closest verb or noun. <br /><br /> We first demonstrate that the best approach to extracting tuples from parsed texts is a top-down postorder traversal algorithm. Following that, the various challenges in forming the prepositional classes utilizing WordNet semantic relations are described. We then discuss the actions that need to be taken towards applying the prepositional classes to the disambiguation task. A novel application of this method is also discussed, by which the tuples to be disambiguated are also expanded via WordNet, thus introducing a client-side application of the algorithms utilized to build prepositional classes. Finally, we present results of different variants of our disambiguating algorithm, contrasting the precision and recall of various combinations of constraints, and comparing our algorithm to a baseline method that falls back to attaching a prepositional phrase to the closest left phrase. Our conclusion is that our algorithm provides improved performance compared to the baseline and is therefore a useful new method of performing knowledge-based disambiguation of prepositional phrase attachments. Computer Science Natural language processing disambiguation semantics
6	Prepositional Phrase Attachment Disambiguation Using WordNet Spitzer, Claus January 2006 (has links) In this thesis we use a knowledge-based approach to disambiguating prepositional phrase attachments in English sentences. This method was first introduced by S. M. Harabagiu. The Penn Treebank corpus is used as the training text. We extract 4-tuples of the form <em>VP</em>, <em>NP</em><sub>1</sub>, Prep, <em>NP</em><sub>2</sub> and sort them into classes according to the semantic relationships between parts of each tuple. These relationships are extracted from WordNet. Classes are sorted into different tiers based on the strictness of their semantic relationship. Disambiguation of prepositional phrase attachments can be cast as a constraint satisfaction problem, where the tiers of extracted classes act as the constraints. Satisfaction is achieved when the strictest possible tier unanimously indicates one kind of attachment. The most challenging kind of problems for disambiguation of prepositional phrases are ones where the prepositional phrase may attach to either the closest verb or noun. <br /><br /> We first demonstrate that the best approach to extracting tuples from parsed texts is a top-down postorder traversal algorithm. Following that, the various challenges in forming the prepositional classes utilizing WordNet semantic relations are described. We then discuss the actions that need to be taken towards applying the prepositional classes to the disambiguation task. A novel application of this method is also discussed, by which the tuples to be disambiguated are also expanded via WordNet, thus introducing a client-side application of the algorithms utilized to build prepositional classes. Finally, we present results of different variants of our disambiguating algorithm, contrasting the precision and recall of various combinations of constraints, and comparing our algorithm to a baseline method that falls back to attaching a prepositional phrase to the closest left phrase. Our conclusion is that our algorithm provides improved performance compared to the baseline and is therefore a useful new method of performing knowledge-based disambiguation of prepositional phrase attachments. Computer Science Natural language processing disambiguation semantics
7	Bayesian nonparametric models for name disambiguation and supervised learning Dai, Andrew Mingbo January 2013 (has links) This thesis presents new Bayesian nonparametric models and approaches for their development, for the problems of name disambiguation and supervised learning. Bayesian nonparametric methods form an increasingly popular approach for solving problems that demand a high amount of model flexibility. However, this field is relatively new, and there are many areas that need further investigation. Previous work on Bayesian nonparametrics has neither fully explored the problems of entity disambiguation and supervised learning nor the advantages of nested hierarchical models. Entity disambiguation is a widely encountered problem where different references need to be linked to a real underlying entity. This problem is often unsupervised as there is no previously known information about the entities. Further to this, effective use of Bayesian nonparametrics offer a new approach to tackling supervised problems, which are frequently encountered. The main original contribution of this thesis is a set of new structured Dirichlet process mixture models for name disambiguation and supervised learning that can also have a wide range of applications. These models use techniques from Bayesian statistics, including hierarchical and nested Dirichlet processes, generalised linear models, Markov chain Monte Carlo methods and optimisation techniques such as BFGS. The new models have tangible advantages over existing methods in the field as shown with experiments on real-world datasets including citation databases and classification and regression datasets. I develop the unsupervised author-topic space model for author disambiguation that uses free-text to perform disambiguation unlike traditional author disambiguation approaches. The model incorporates a name variant model that is based on a nonparametric Dirichlet language model. The model handles both novel unseen name variants and can model the unknown authors of the text of the documents. Through this, the model can disambiguate authors with no prior knowledge of the number of true authors in the dataset. In addition, it can do this when the authors have identical names. I use a model for nesting Dirichlet processes named the hybrid NDP-HDP. This model allows Dirichlet processes to be clustered together and adds an additional level of structure to the hierarchical Dirichlet process. I also develop a new hierarchical extension to the hybrid NDP-HDP. I develop this model into the grouped author-topic model for the entity disambiguation task. The grouped author-topic model uses clusters to model the co-occurrence of entities in documents, which can be interpreted as research groups. Since this model does not require entities to be linked to specific words in a document, it overcomes the problems of some existing author-topic models. The model incorporates a new method for modelling name variants, so that domain-specific name variant models can be used. Lastly, I develop extensions to supervised latent Dirichlet allocation, a type of supervised topic model. The keyword-supervised LDA model predicts document responses more accurately by modelling the effect of individual words and their contexts directly. The supervised HDP model has more model flexibility by using Bayesian nonparametrics for supervised learning. These models are evaluated on a number of classification and regression problems, and the results show that they outperform existing supervised topic modelling approaches. The models can also be extended to use similar information to the previous models, incorporating additional information such as entities and document titles to improve prediction. 519.5
8	Multilingual Word Sense Disambiguation Using Wikipedia Dandala, Bharath 08 1900 (has links) Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Generally the problem of resolving ambiguity in literature has revolved around the famous quote “you shall know the meaning of the word by the company it keeps.” In this thesis, we investigate the role of context for resolving ambiguity through three different approaches. Instead of using a predefined monolingual sense inventory such as WordNet, we use a language-independent framework where the word senses and sense-tagged data are derived automatically from Wikipedia. Using Wikipedia as a source of sense-annotations provides the much needed solution for knowledge acquisition bottleneck. In order to evaluate the viability of Wikipedia based sense-annotations, we cast the task of disambiguating polysemous nouns as a monolingual classification task and experimented on lexical samples from four different languages (viz. English, German, Italian and Spanish). The experiments confirm that the Wikipedia based sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. It is a long belief that exploiting multiple languages helps in building accurate word sense disambiguation systems. Subsequently, we developed two approaches that recast the task of disambiguating polysemous nouns as a multilingual classification task. The first approach for multilingual word sense disambiguation attempts to effectively use a machine translation system to leverage two relevant multilingual aspects of the semantics of text. First, the various senses of a target word may be translated into different words, which constitute unique, yet highly salient signal that effectively expand the target word’s feature space. Second, the translated context words themselves embed co-occurrence information that a translation engine gathers from very large parallel corpora. The second approach for multlingual word sense disambiguation attempts to reduce the reliance on the machine translation system during training by using the multilingual knowledge available in Wikipedia through its interlingual links. Finally, the experiments on a lexical sample from four different languages confirm that the multilingual systems perform better than the monolingual system and significantly improve the disambiguation accuracy. Wikipedia word sense disambiguation supervised learning multilingual
9	Desambiguação lexical de sentidos para o português por meio de uma abordagem multilíngue mono e multidocumento / Word Sense Disambiguation for portuguese through multilingual mono and multi-document Nóbrega, Fernando Antônio Asevêdo 28 May 2013 (has links) A ambiguidade lexical é considerada uma das principais barreiras para melhoria de aplicações do Processamento de Língua Natural (PLN). Neste contexto, tem-se a área de Desambiguação Lexical de Sentido (DLS), cujo objetivo é desenvolver e avaliar métodos que determinem o sentido correto de uma palavra em um determinado contexto por meio de um conjunto finito de possíveis significados. A DLS é empregada, principalmente, no intuito de prover recursos e ferramentas para diminuir problemas de ambiguidade e, consequentemente, contribuir para melhorias de resultados em outras áreas do PLN. Para o Português do Brasil, pouco se tem pesquisado nesta área, havendo alguns trabalhos bem específicos de domínio. Outro fator importante é que diversas áreas do PLN engajam-se no cenário multidocumento, onde a computação é efetuada sobre uma coleção de textos, todavia, não há relato de trabalhos de DLS direcionados a este cenário, tampouco experimentos de desambiguação neste domínio. Portanto, neste trabalho de mestrado, objetivou-se o desenvolvimento de métodos de DLS de domínio geral voltado à língua Portuguesa do Brasil e o desenvolvimento de algoritmos de desambiguação que façam uso de informações multidocumento, bem como a experimentação e avaliação destes no cenário multidocumento. Para tanto, a fim de subsidiar experimentos, desenvolvimento e avaliação deste projeto, anotou-se manualmente o córpus CSTNews, caracterizado como um córpus multidocumento, utilizando a WordNet de Princeton como repositório de sentidos, que organiza os significados por meio de conjuntos de sinônimos ( synsets) e relações linguísticas entre estes. Foram desenvolvidos quatro métodos de DLS e algumas variações, sendo: um método heurístico (para aferir valores de baseline); variações do algoritmo de Lesk (1986); adaptação do algoritmo de Mihalcea and Moldovan (1999); e uma variação do método de Lesk para o cenário multidocumento. Foram realizados três experimentos para avaliação dos métodos, cujos objetivos foram: determinar o desempenho geral dos algoritmos em todo o córpus; avaliar a qualidade de desambiguação de palavras mais ambíguas no córpus; e verificar o ganho de qualidade da desambiguação ao empregar informação multidocumento. Após estes experimentos, pôde-se observar que o método heurístico apresenta um melhor resultado geral. Contudo, é importante ressaltar que a maioria das palavras anotadas no córpus tiveram apenas um synset, que, normalmente, era o mais frequente, o que, consequentemente, apresenta um cenário mais propício ao método heurístico. Outro fato importante foi que, neste cenário, a diferença de desempenho entre o método de DLS multidocumento e o heurístico é estatisticamente irrelevante. Já para a desambiguação de palavras mais ambíguas, o método heurístico foi inferior, evidenciando que, para a desambiguação de palavras mais ambíguas, são necessários métodos mais sofisticados de DLS. Por fim, verificou-se que a utilização de informação multidocumento auxilia o processo de desambiguação. As contribuições deste trabalho podem ser agrupadas entre teóricas e técnicas. Nas teóricas, tem-se a investigação e análises da DLS no cenário multidocumento. Entre as contribuições técnicas, foram desenvolvidos métodos de DLS, um córpus anotado e uma ferramenta de anotação direcionados à língua Portuguesa do Brasil, que podem avançar as pesquisas em DLS para o idioma / The lexical ambiguity is considered one of the main barries to improving applications of Natural Language Processing (NLP). In this context, it has benn the area of Word Sense Disambiguation (WSD), whose goal is to develop and evaluate methods to determine the correct sense of a word in a give context by a nite set of possible meanings. The DLS is used mainly in order to provide resources and tools to reduce problems of ambiguity and thus contribute to improved results in other areas of NLP. In the Portuguese of Brazil, little has been researched in this area, with some work and specic domain. Another important factor is that many areas of NLP commit themselves in multidocument scenario, where the computation is performed on a collection of texts, however, there is no report of WSD work directed to this scenario, either disambiguation experiments in this eld. Therefore, this master thesis aimed to develop methods of WSD general domain facing the Portuguese language in Brazil and the development of algorithms that make use of disambiguation multidocument informations, as well as experimentation and evaluation of the multidocument scenario. Therefore, in order to support experiments, development and evaluation of this project, the corpus CSTNews with 50 document collections, was manually annotated by means of synsets of the WordNet Princeton. Four methods were developed: A heuristic method (to measure values fo baseline); variations of the Lesk (1986) algorithm; a adaptation of the Mihalcea and Moldovan (1999) algorithm; and a variation of the Lesk method for multidocument scenario. Three experiments were conducted to evaluate the methods, whose objectives were to determine the general performance algorithms across the corpus; evaluate the quality of disambiguation of most ambiguous words in the corpus, and check the gain quality of disambiguation by employing information multidocumento. After these experiments, it was observed that the heuristic method presents a better overall result. However, it is important to note that most of the words in the annotated corpus had only one synset, which usually was the most frequent, which, in turn, presents a scenario more conducive to the heuristic method. Another important fact was that in this scenario, the performance dierence between the heuristic method and multidocument algorithm was statistically irrelevant. As for the disambiguation of most ambiguous words, the heuristic method was lower, indicating that, for the disambiguation of ambiguous words, more sophisticated WSD methods are needed. Finally, it has been found that the use of multidocument information assists the disambiguation process. The contributions of this work can be divided between theoretical and technical. In theory, there is the research and analysis of WSD in multidocument scenario. Among the techniques contributions, WSD methods have been developed an annotated corpus and annotation tool targeted to the Portuguese language in Brazil that can advance research in WSD for the language Cenário multidocumento Desambiguação lexical de sentidos Disambiguation Multidocument scenario Word sense Word Sense Disambiguation WordNet WordNet
10	Entity-Centric Text Mining for Historical Documents Coll Ardanuy, Maria 07 July 2017 (has links) No description available. 510 digital humanities text mining toponym disambiguation person name disambiguation historical text mining Informatik (PPN619939052)

Search results