Global ETD Search

191	Towards the Development of an Automatic Diacritizer for the Persian Orthography based on the Xerox Finite State Transducer Nojoumian, Peyman 12 August 2011 (has links) Due to the lack of short vowels or diacritics in Persian orthography, many Natural Language Processing applications for this language, including information retrieval, machine translation, text-to-speech, and automatic speech recognition systems need to disambiguate the input first, in order to be able to do further processing. In machine translation, for example, the whole text should be correctly diacritized first so that the correct words, parts of speech and meanings are matched and retrieved from the lexicon. This is primarily because of Persian’s ambiguous orthography. In fact, the core engine of any Persian language processor should utilize a diacritizer and a lexical disambiguator. This dissertation describes the design and implementation of an automatic diacritizer for Persian based on the state-of-the-art Finite State Transducer technology developed at Xerox by Beesley & Karttunen (2003). The result of morphological analysis and generation on a test corpus is shown, including the insertion of diacritics. This study will also look at issues that are raised by phonological and semantic ambiguities as a result of short vowels in Persian being absent in the writing system. It suggests a hybrid model (rule-based & inductive) that is inspired by psycholinguistic experiments on the human mental lexicon for the disambiguation of heterophonic homographs in Persian using frequency and collocation information. A syntactic parser can be developed based on the proposed model to discover Ezafe (the linking short vowel /e/ within a noun phrase) or disambiguate homographs, but its implementation is left for future work. Persian Persian computational linguistics diacritizer morphological analyzer heterophonic homograph disambiguation
192	Evaluating Text Segmentation Fournier, Christopher 24 April 2013 (has links) This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to perform this task, and the question that this thesis answers is how to select the best automatic segmenter for such a task. This requires choosing an appropriate segmentation evaluation metric, confirming the reliability of a manual solution, and then finally employing an evaluation methodology that can select the automatic segmenter that best approximates human performance. A variety of comparison methods and metrics exist for comparing segmentations (e.g., WindowDiff, Pk), and all save a few are able to award partial credit for nearly missing a boundary. Those comparison methods that can award partial credit unfortunately lack consistency, symmetricity, intuition, and a host of other desirable qualities. This work proposes a new comparison method named boundary similarity (B) which is based upon a new minimal boundary edit distance to compare two segmentations. Near misses are frequent, even among manual segmenters (as is exemplified by the low inter-coder agreement reported by many segmentation studies). This work adapts some inter-coder agreement coefficients to award partial credit for near misses using the new metric proposed herein, B. The methodologies employed by many works introducing automatic segmenters evaluate them simply in terms of a comparison of their output to one manual segmentation of a text, and often only by presenting nothing other than a series of mean performance values (along with no standard deviation, standard error, or little if any statistical hypothesis testing). This work asserts that one segmentation of a text cannot constitute a “true” segmentation; specifically, one manual segmentation is simply one sample of the population of all possible segmentations of a text and of that subset of desirable segmentations. This work further asserts that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings, and then statistical hypothesis testing using the specific comparison methods and methodologies demonstrated herein should be used to select the best automatic segmenter. This work proposes new segmentation evaluation metrics, adapted inter-coder agreement coefficients, and methodologies. Most important, this work experimentally compares the state-or-the-art comparison methods to those proposed herein upon artificial data that simulates a variety of scenarios and chooses the best one (B). The ability of adapted inter-coder agreement coefficients, based upon B, to discern between various levels of agreement in artificial and natural data sets is then demonstrated. Finally, a contextual evaluation of three automatic segmenters is performed using the state-of-the art comparison methods and B using the methodology proposed herein to demonstrate the benefits and versatility of B as opposed to its counterparts. computational linguistics evaluation natural language processing segmentation boundary similarity
193	Coping with uncertainty : noun phrase interpretation and early semantic analysis Mellish, Christopher Stuart January 1981 (has links) A computer program which can "understand" natural language texts must have both syntactic knowledge about the language concerned and semantic knowledge of how what is written relates to its internal representation of the world. It has been a matter of some controversy how these sources of information can best be integrated to translate from an input text to a formal meaning representation. The controversy has concerned largely the question as to what degree of syntactic analysis must be performed before any semantic analysis can take place. An extreme position in this debate is that a syntactic parse tree for a complete sentence must be produced before any investigation of that sentence's meaning is appropriate. This position has been criticised by those who see understanding as a process that takes place gradually as the text is read, rather than in sudden bursts of activity at the ends of sentences. These people advocate a model where semantic analysis can operate on fragments of text before the global syntactic structure is determined - a strategy which we will call early semantic analysis. In this thesis, we investigate the implications of early semantic analysis in the interpretation of noun phrases. One possible approach is to say that a noun phrase is a self-contained unit and can be fully interpreted by the time it has been read. Thus it can always be determined what objects a noun phrase refers to without consulting much more than the structure of the phrase itself. This approach was taken in part by Winograd [Winograd 72], who saw the constraint that a noun phrase have a referent as a valuable aid in resolving local syntactic ambiguity. Unfortunately, Winograd's work has been criticised by Ritchie, because it is not always possible to determine what a noun phrase refers to purely on the basis of local information. In this thesis, we will go further than this and claim that, because the meaning of a noun phrase can be affected by so many factors outside the phrase itself, it makes no sense to talk about "the referent" as a function of -a noun phrase. Instead, the notion of "referent" is something defined by global issues of structure and consistency. Having rejected one approach to the early semantic analysis of noun phrases, we go on to develop an alternative, which we call incremental evaluation. The basic idea is that a noun phrase does provide some information about what it refers to. It should be possible to represent this partial information and gradually refine it as relevant implications of the context are followed up. Moreover, the partial information should be available to an inference system, which, amongst other things, can detect the absence of a referent and provide the advantages of Winograd's system. In our system, noun phrase interpretation does take place locally, but the point is that it does not finish there. Instead, the determination of the meaning of a noun phrase is spread over the subsequent analysis of how it contributes to the meaning of the text as a whole. 410
194	Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text Strommer, Claus Walter January 2011 (has links) In this thesis we propose a novel metric of document intent evaluation based on the detection and classification of rhetorical figure. In doing so we dispel the notion that rhetoric lacks the structure and consistency necessary to be relevant to computational linguistics. We show how the combination of document attributes available through shallow parsing and rules extracted from the definitions of rhetorical figures produce a metric which can be used to reliably classify the intent of texts. This metric works equally well on entire documents as on portions of a document. rhetoric classification natural language processing computational linguistics epanaphora Computer Science
195	Linguistic data models : presentation and representation / Penton, Dave. January 2006 (has links) Thesis (M.Eng.Sc.)--University of Melbourne, Dept. of Computer Science and Software Engineering, 2006. / Typescript. Includes bibliographical references (leaves 227-241).
196	A robust architecture for human language technology systems Stanley, Theban, January 2006 (has links) Thesis (M.S.) -- Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.
197	Word sense disambiguation and context / Martirosyan, Anahit. January 1900 (has links) Thesis (M.C.S.) - Carleton University, 2005. / Includes bibliographical references (p. 91-97). Also available in electronic format on the Internet.
198	Natural language processing tools for reading level assessment and text simplication for bilingual education / Petersen, Sarah E. January 2007 (has links) Thesis (Ph. D.)--University of Washington, 2007. / Vita. Includes bibliographical references (p. 116-124).
199	Parsing natural language / Wilcox, Leonard E. January 1983 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1983. / Typescript. Includes bibliographical references (leaves 115-119).
200	UGURU : a natural language Unix consultant / Hanson, John Douglas. January 1987 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1987. / Typescript. Includes bibliographical references (leaves 172-175).

Search results