Spelling suggestions: "subject:"anguage processing"" "subject:"1anguage processing""
71 |
Flexible speech synthesis using weighted finite-state transducers /Bulyko, Ivan. January 2002 (has links)
Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 110-123).
|
72 |
Logical specification of finite-state transductions for natural language processingVaillette, Nathan, January 2004 (has links)
Thesis (Ph. D.)--Ohio State University, 2004. / Title from first page of PDF file. Document formatted into pages; contains xv, 253 p.; also includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes bibliographical references (p. 245-253).
|
73 |
Minimally supervised induction of morphology through bitextsMoon, Taesun, Ph. D. 17 January 2013 (has links)
A knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics. / text
|
74 |
Typesafe NLP pipelines on SparkHafner, Simon 24 February 2015 (has links)
Natural language pipelines consist of various natural language algorithms that use the annotations of a previous algorithm to compute more annotations. These algorithms tend to be expensive in terms of computational power. Therefore it is advantageous to parallelize them in order to reduce the time necessary to analyze a large document collection. The goal of this project was to develop a new framework to encapsulate algorithms such that they may be used as part of a pipeline without any additional work. The framework consists of a custom-built data structure called Slab which implements type safety and functional transparency to integrate itself into the Scala programming language. Because of this integration, it is possible to use Spark, a MapReduce framework, to parallelize the pipeline on a cluster. To assess the performance of the new framework, a pipeline based on the OpenNLP library was created. An existing pipeline implemented in UIMA, an industry standard for natural language pipeline frameworks, served as a baseline in terms of performance. The pipeline created from the new framework processed the corpus in about half the time. / text
|
75 |
Text mining with information extractionNahm, Un Yong 28 August 2008 (has links)
Not available / text
|
76 |
Generating reference to visible objectsMitchell, Margaret January 2013 (has links)
In this thesis, I examine human-like language generation from a visual input head-on, exploring how people refer to visible objects in the real world. Using previous work and the studies from this thesis, I propose an algorithm that generates humanlike reference to visible objects. Rather than introduce a general-purpose REG algorithm, as is tradition, I address the sorts of properties that visual domains in particular make available, and the ways that these must be processed in order to be used in a referring expression algorithm. This method uncovers several issues in generating human-like language that have not been thoroughly studied before. I focus on the properties of color, size, shape, and material, and address the issues of algorithm determinism and how speaker variation may be generated; unique identification of objects and whether this is an appropriate goal for generating humanlike reference; atypicality and the role it plays in reference; and multi-featured values for visual attributes. Technical contributions from this thesis include (1) an algorithm for generating size modifiers from features in a visual scene; and (2) a referring expression generation algorithm that generates structures for varied, human-like reference.
|
77 |
A computational model of lexical incongruity in humorous textVenour, Chris January 2013 (has links)
Many theories of humour claim that incongruity is an essential ingredient of humour. How- ever this idea is poorly understood and little work has been done in computational humour to quantify it. For example classifiers which attempt to distinguish jokes from regular texts tend to look for secondary features of humorous texts rather than for incongruity. Similarly most joke generators attempt to recreate structural patterns found in example jokes but do not deliberately endeavour to create incongruity. As in previous research, this thesis develops classifiers and a joke generator which attempt to automatically recognize and generate a type of humour. However the systems described here differ from previous programs because they implement a model of a certain type of humorous incongruity. We focus on a type of register humour we call lexical register jokes in which the tones of individual words are in conflict with each other. Our goal is to create a semantic space that reflects the kind of tone at play in lexical register jokes so that words that are far apart in the space are not simply different but exhibit the kinds of incongruities seen in lexical jokes. This thesis attempts to develop such a space and various classifiers are implemented to use it to distinguish lexical register jokes from regular texts. The best of these classifiers achieved high levels of accuracy when distinguishing between a test set of lexical register jokes and 4 different kinds of regular text. A joke generator which makes use of the semantic space to create original lexical register jokes is also implemented and described in this thesis. In a test of the generator, texts that were generated by the system were evaluated by volunteers who considered them not as humorous as human-made lexical register jokes but significantly more humorous than a set of control (i.e.non- joke) texts. This was an encouraging result which suggests that the vector space is somewhat successful in discovering lexical differences in tone and in modelling lexical register jokes.
|
78 |
Automatic multi-document summarization for digital librariesOu, Shiyan, Khoo, Christopher S.G., Goh, Dion H. January 2006 (has links)
With the rapid growth of the World Wide Web and online information services, more and more information is available and accessible online. Automatic summarization is an indispensable solution to reduce the information overload problem. Multi-document summarization is useful to provide an overview of a topic and allow users to zoom in for more details on aspects of interest. This paper reports three types of multi-document summaries generated for a set of research abstracts, using different summarization approaches: a sentence-based summary generated by a MEAD summarization system that extracts important sentences using various features, another sentence-based summary generated by extracting research objective sentences, and a variable-based summary focusing on research concepts and relationships. A user evaluation was carried out to compare the three types of summaries. The evaluation results indicated that the majority of users (70%) preferred the variable-based summary, while 55% of the users preferred the research objective summary, and only 25% preferred the MEAD summary.
|
79 |
Statistical methods for spoken dialogue managementThomson, Blaise Roger Marie January 2010 (has links)
No description available.
|
80 |
Acquiring syntactic and semantic transformations in question answeringKaisser, Michael January 2010 (has links)
One and the same fact in natural language can be expressed in many different ways by using different words and/or a different syntax. This phenomenon, commonly called paraphrasing, is the main reason why Natural Language Processing (NLP) is such a challenging task. This becomes especially obvious in Question Answering (QA) where the task is to automatically answer a question posed in natural language, usually in a text collection also consisting of natural language texts. It cannot be assumed that an answer sentence to a question uses the same words as the question and that these words are combined in the same way by using the same syntactic rules. In this thesis we describe methods that can help to address this problem. Firstly we explore how lexical resources, i.e. FrameNet, PropBank and VerbNet can be used to recognize a wide range of syntactic realizations that an answer sentence to a given question can have. We find that our methods based on these resources work well for web-based Question Answering. However we identify two problems: 1) All three resources as of yet have significant coverage issues. 2) These resources are not suitable to identify answer sentences that show some form of indirect evidence. While the first problem hinders performance currently, it is not a theoretical problem that renders the approach unsuitable–it rather shows that more efforts have to be made to produce more complete resources. The second problem is more persistent. Many valid answer sentences–especially in small, journalistic corpora–do not provide direct evidence for a question, rather they strongly suggest an answer without logically implying it. Semantically motivated resources like FrameNet, PropBank and VerbNet can not easily be employed to recognize such forms of indirect evidence. In order to investigate ways of dealing with indirect evidence, we used Amazon’s Mechanical Turk to collect over 8,000 manually identified answer sentences from the AQUAINT corpus to the over 1,900 TREC questions from the 2002 to 2006 QA tracks. The pairs of answer sentences and their corresponding questions form the QASP corpus, which we released to the public in April 2008. In this dissertation, we use the QASP corpus to develop an approach to QA based on matching dependency relations between answer candidates and question constituents in the answer sentences. By acquiring knowledge about syntactic and semantic transformations from dependency relations in the QASP corpus, additional answer candidates can be identified that could not be linked to the question with our first approach.
|
Page generated in 0.1008 seconds