Global ETD Search

91	Advanced Intranet Search Engine Narayan, Nitesh January 2009 (has links) Information retrieval has been a prevasive part of human society since its existence.With the advent of internet and World wide Web it became an extensive area of researchand major foucs, which lead to development of various search engines to locate the de-sired information, mostly for globally connected computer networks viz. internet.Butthere is another major part of computer network viz. intranet, which has not seen muchof advancement in information retrieval approaches, in spite of being a major source ofinformation within a large number of organizations.Most common technique for intranet based search engines is still mere database-centric. Thus practically intranets are unable to avail the beneﬁts of sophisticated tech-niques that have been developed for internet based search engines without exposing thedata to commercial search engines.In this Master level thesis we propose a ”state of the art architecture” for an advancedsearch engine for intranet which is capable of dealing with continuously growing sizeof intranets knowledge base. This search engine employs lexical processing of doc-umetns,where documents are indexed and searched based on standalone terms or key-words, along with the semantic processing of the documents where the context of thewords and the relationship among them is given more importance.Combining lexical and semantic processing of the documents give an effective ap-proach to handle navigational queries along with research queries, opposite to the modernsearch engines which either uses lexical processing or semantic processing (or one as themajor) of the documents. We give equal importance to both the approaches in our design,considering best of the both world.This work also takes into account various widely acclaimed concepts like inferencerules, ontologies and active feedback from the user community to continuously enhanceand improve the quality of search results along with the possibility to infer and deducenew knowledge from the existing one, while preparing for the advent of semantic web. semantic lexical search engine natural language processing
92	Prepositional Phrase Attachment Disambiguation Using WordNet Spitzer, Claus January 2006 (has links) In this thesis we use a knowledge-based approach to disambiguating prepositional phrase attachments in English sentences. This method was first introduced by S. M. Harabagiu. The Penn Treebank corpus is used as the training text. We extract 4-tuples of the form <em>VP</em>, <em>NP</em><sub>1</sub>, Prep, <em>NP</em><sub>2</sub> and sort them into classes according to the semantic relationships between parts of each tuple. These relationships are extracted from WordNet. Classes are sorted into different tiers based on the strictness of their semantic relationship. Disambiguation of prepositional phrase attachments can be cast as a constraint satisfaction problem, where the tiers of extracted classes act as the constraints. Satisfaction is achieved when the strictest possible tier unanimously indicates one kind of attachment. The most challenging kind of problems for disambiguation of prepositional phrases are ones where the prepositional phrase may attach to either the closest verb or noun. <br /><br /> We first demonstrate that the best approach to extracting tuples from parsed texts is a top-down postorder traversal algorithm. Following that, the various challenges in forming the prepositional classes utilizing WordNet semantic relations are described. We then discuss the actions that need to be taken towards applying the prepositional classes to the disambiguation task. A novel application of this method is also discussed, by which the tuples to be disambiguated are also expanded via WordNet, thus introducing a client-side application of the algorithms utilized to build prepositional classes. Finally, we present results of different variants of our disambiguating algorithm, contrasting the precision and recall of various combinations of constraints, and comparing our algorithm to a baseline method that falls back to attaching a prepositional phrase to the closest left phrase. Our conclusion is that our algorithm provides improved performance compared to the baseline and is therefore a useful new method of performing knowledge-based disambiguation of prepositional phrase attachments. Computer Science Natural language processing disambiguation semantics
93	Flexible speech synthesis using weighted finite-state transducers / Bulyko, Ivan. January 2002 (has links) Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 110-123).
94	Logical specification of finite-state transductions for natural language processing Vaillette, Nathan, January 2004 (has links) Thesis (Ph. D.)--Ohio State University, 2004. / Title from first page of PDF file. Document formatted into pages; contains xv, 253 p.; also includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes bibliographical references (p. 245-253).
95	Semantic interpretation with distributional analysis Glass, Michael Robert 05 July 2012 (has links) Unstructured text contains a wealth of knowledge, however, it is in a form unsuitable for reasoning. Semantic interpretation is the task of processing natural language text to create or extend a coherent, formal knowledgebase able to reason and support question answering. This task involves entity, event and relation extraction, co-reference resolution, and inference. Many domains, from intelligence data to bioinformatics, would benefit by semantic interpretation. But traditional approaches to the subtasks typically require a large annotated corpus specific to a single domain and ontology. This dissertation describes an approach to rapidly train a semantic interpreter using a set of seed annotations and a large, unlabeled corpus. Our approach adapts methods from paraphrase acquisition and automatic thesaurus construction to extend seed syntactic to semantic mappings using an automatically gathered, domain specific, parallel corpus. During interpretation, the system uses joint probabilistic inference to select the most probable interpretation consistent with the background knowledge. We evaluate both the quality of the extended mappings as well as the performance of the semantic interpreter. / text Natural language Unsupervised learning Semantic interpretation
96	Minimally supervised induction of morphology through bitexts Moon, Taesun, Ph. D. 17 January 2013 (has links) A knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics. / text Morphological clustering Inflectional morphology Natural language processing
97	Typesafe NLP pipelines on Spark Hafner, Simon 24 February 2015 (has links) Natural language pipelines consist of various natural language algorithms that use the annotations of a previous algorithm to compute more annotations. These algorithms tend to be expensive in terms of computational power. Therefore it is advantageous to parallelize them in order to reduce the time necessary to analyze a large document collection. The goal of this project was to develop a new framework to encapsulate algorithms such that they may be used as part of a pipeline without any additional work. The framework consists of a custom-built data structure called Slab which implements type safety and functional transparency to integrate itself into the Scala programming language. Because of this integration, it is possible to use Spark, a MapReduce framework, to parallelize the pipeline on a cluster. To assess the performance of the new framework, a pipeline based on the OpenNLP library was created. An existing pipeline implemented in UIMA, an industry standard for natural language pipeline frameworks, served as a baseline in terms of performance. The pipeline created from the new framework processed the corpus in about half the time. / text Natural language processing NLP Pipelines Spark Slab
98	Text mining with information extraction Nahm, Un Yong 28 August 2008 (has links) Not available / text Data mining
99	Generating reference to visible objects Mitchell, Margaret January 2013 (has links) In this thesis, I examine human-like language generation from a visual input head-on, exploring how people refer to visible objects in the real world. Using previous work and the studies from this thesis, I propose an algorithm that generates humanlike reference to visible objects. Rather than introduce a general-purpose REG algorithm, as is tradition, I address the sorts of properties that visual domains in particular make available, and the ways that these must be processed in order to be used in a referring expression algorithm. This method uncovers several issues in generating human-like language that have not been thoroughly studied before. I focus on the properties of color, size, shape, and material, and address the issues of algorithm determinism and how speaker variation may be generated; unique identification of objects and whether this is an appropriate goal for generating humanlike reference; atypicality and the role it plays in reference; and multi-featured values for visual attributes. Technical contributions from this thesis include (1) an algorithm for generating size modifiers from features in a visual scene; and (2) a referring expression generation algorithm that generates structures for varied, human-like reference. 004
100	A computational model of lexical incongruity in humorous text Venour, Chris January 2013 (has links) Many theories of humour claim that incongruity is an essential ingredient of humour. How- ever this idea is poorly understood and little work has been done in computational humour to quantify it. For example classifiers which attempt to distinguish jokes from regular texts tend to look for secondary features of humorous texts rather than for incongruity. Similarly most joke generators attempt to recreate structural patterns found in example jokes but do not deliberately endeavour to create incongruity. As in previous research, this thesis develops classifiers and a joke generator which attempt to automatically recognize and generate a type of humour. However the systems described here differ from previous programs because they implement a model of a certain type of humorous incongruity. We focus on a type of register humour we call lexical register jokes in which the tones of individual words are in conflict with each other. Our goal is to create a semantic space that reflects the kind of tone at play in lexical register jokes so that words that are far apart in the space are not simply different but exhibit the kinds of incongruities seen in lexical jokes. This thesis attempts to develop such a space and various classifiers are implemented to use it to distinguish lexical register jokes from regular texts. The best of these classifiers achieved high levels of accuracy when distinguishing between a test set of lexical register jokes and 4 different kinds of regular text. A joke generator which makes use of the semantic space to create original lexical register jokes is also implemented and described in this thesis. In a test of the generator, texts that were generated by the system were evaluated by volunteers who considered them not as humorous as human-made lexical register jokes but significantly more humorous than a set of control (i.e.non- joke) texts. This was an encouraging result which suggests that the vector space is somewhat successful in discovering lexical differences in tone and in modelling lexical register jokes. 004

Search results