Modeling Relevance in Statistical Machine Translation: Scoring Alignment, Context, and Annotations of Translation InstancesPhillips, Aaron B. 01 January 2012 (has links)
Machine translation has advanced considerably in recent years, primarily due to the availability of larger datasets. However, one cannot rely on the availability of copious, high-quality bilingual training data. In this work, we improve upon the state-of-the-art in machine translation with an instance-based model that scores each instance of translation in the corpus. A translation instance reflects a source and target correspondence at one specific location in the corpus. The significance of this approach is that our model is able to capture that some instances of translation are more relevant than others. We have implemented this approach in Cunei, a new platform for machine translation that permits the scoring of instance-specific features. Leveraging per-instance alignment features, we demonstrate that Cunei can outperform Moses, a widely-used machine translation system. We then expand on this baseline system in three principal directions, each of which shows further gains. First, we score the source context of a translation instance in order to favor those that are most similar to the input sentence. Second, we apply similar techniques to score the target context of a translation instance and favor those that are most similar to the target hypothesis. Third, we provide a mechanism to mark-up the corpus with annotations (e.g. statistical word clustering, part-of-speech labels, and parse trees) and then exploit this information to create additional perinstance similarity features. Each of these techniques explicitly takes advantage of the fact that our approach scores each instance of translation on demand after the input sentence is provided and while the target hypothesis is being generated; similar extensions would be impossible or quite difficult in existing machine translation systems. Ultimately, this approach provides a more exible framework for integration of novel features that adapts better to new data. In our experiments with German-English and Czech-English translation, the addition of instance-specific features consistently shows improvement.
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 74-79). Also available in electronic version. Access restricted to campus users.
Salveter, Sharon Caroline.
Thesis--Wisconsin. / Vita. Includes bibliographical references (leaves 118-122).
Thesis (M.Sc.). / Written for the School of Computer Science. Title from title page of PDF (viewed 2008/12/09). Includes bibliographical references.
Translation accuracy comparison between machine translation and context-free machine natural language grammar–based translationWang, Long Qi January 2018 (has links)
University of Macau / Faculty of Science and Technology. / Department of Computer and Information Science
<p>In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.</p>
Zhang, Lidan., 张丽丹.
published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
Cahill, Lynne Julie
This thesis addresses the problem of accounting for morphological alternation within Natural Language Processing. It proposes an approach to morphology which is based on phonological concepts, in particular the syllable, in contrast to morpheme-based approaches which have standardly been used by both NLP and linguistics. It is argued that morpheme-based approaches, within both linguistics and NLP, grew out of the apparently purely affixational morphology of European languages, and especially English, but are less appropriate for non-affixational languages such as Arabic. Indeed, it is claimed that even accounts of those European languages miss important linguistic generalizations by ignoring more phonologically based alternations, such as umlaut in German and ablaut in English. To justify this approach, we present a wide range of data from languages as diverse as German and Rotuman. A formal language, MOLUSe, is described, which allows for the definition of declarative mappings between syllable-sequences, and accounts of non-trivial fragments of the inflectional morphology of English, Arabic and Sanskrit are presented, to demonstrate the capabilities of the language. A semantics for the language is defined, and the implementation of an interpreter is described. The thesis discusses theoretical (linguistic) issues, as well as implementational issues involved in the incorporation of MOLUSC into a larger lexicon system. The approach is contrasted with previous work in computational morphology, in particular finite-state morphology, and its relation to other work in the fields of morphology and phonology is also discussed.
Khan, Imtiaz Hussain
Managing Surface Ambiguity in the Generation of Referring Expressions (Imtiaz Hussain Khan) Most algorithms for the Generation of Referring Expressions tend to generate distinguishing descriptions at the semantic level, disregarding the ways in which surface issues can affect their quality. This thesis explores the role of surface ambiguities in referring expressions and how the risk of such ambiguities should be taken into account by an algorithm that generates referring expressions. This was done by focussing on the type of surface ambiguity which arises when adjectives occur in coordinated structures (as in the old men and women). The central idea is to use statistical information about lexical co-occurrence to estimate which interpretation of a phrase is most likely for human readers, and to avoid generating phrases where misunderstandings are likely. We develop specific hypotheses, and test them by running experiments with human participants. We found that the Word Sketches are a reliable source of information to predict the likelihood of a reading. The avoidance of misunderstandings is not the only issue to be dealt with in this thesis. Since the avoidance of misunderstandings might be achieved at the cost of very lengthy (or perhaps very disfluent) expressions, it is important to select an optimal expression (i.e., the expression which is preferred by most readers) from various alternatives available. Again, we develop specific hypotheses, and recorded human preferences in a forced-choice manner. We found that participants preferred clear (i.e., not likely to be misunderstood) expressions to unclear ones, but if several of the expressions were clear then brief expressions were preferred over their longer counterparts. The results of these empirical studies motivated the design of a GRE algorithm. The implemented algorithm builds a plural distinguishing description for the intended referents (if one exists), using words; applies transformation rules to the distinguishing description to construct a set of distinguishing descriptions that are logically equivalent. Each description in the set is realised as a corresponding English noun phrase (NP) using appropriate realisation rules; the most likely reading of each NP is determined. One NP is selected for output. A further experiment verifies that the kinds of expressions produced by the algorithm are optimal for readers: they are understood accurately and quickly by readers.
An investigation into the structure of the terminological information contained in special language definitionsNkwenti-Azeh, Blaise January 1989 (has links)
No description available.
Page generated in 0.1721 seconds