Spelling suggestions: "subject:"syntactic reordering"" "subject:"yntactic reordering""
1 |
Unification-based constraints for statistical machine translationWilliams, Philip James January 2014 (has links)
Morphology and syntax have both received attention in statistical machine translation research, but they are usually treated independently and the historical emphasis on translation into English has meant that many morphosyntactic issues remain under-researched. Languages with richer morphologies pose additional problems and conventional approaches tend to perform poorly when either source or target language has rich morphology. In both computational and theoretical linguistics, feature structures together with the associated operation of unification have proven a powerful tool for modelling many morphosyntactic aspects of natural language. In this thesis, we propose a framework that extends a state-of-the-art syntax-based model with a feature structure lexicon and unification-based constraints on the target-side of the synchronous grammar. Whilst our framework is language-independent, we focus on problems in the translation of English to German, a language pair that has a high degree of syntactic reordering and rich target-side morphology. We first apply our approach to modelling agreement and case government phenomena. We use the lexicon to link surface form words with grammatical feature values, such as case, gender, and number, and we use constraints to enforce feature value identity for the words in agreement and government relations. We demonstrate improvements in translation quality of up to 0.5 BLEU over a strong baseline model. We then examine verbal complex production, another aspect of translation that requires the coordination of linguistic features over multiple words, often with long-range discontinuities. We develop a feature structure representation of verbal complex types, using constraint failure as an indicator of translation error and use this to automatically identify and quantify errors that occur in our baseline system. A manual analysis and classification of errors informs an extended version of the model that incorporates information derived from a parse of the source. We identify clause spans and use model features to encourage the generation of complete verbal complex types. We are able to improve accuracy as measured using precision and recall against values extracted from the reference test sets. Our framework allows for the incorporation of rich linguistic information and we present sketches of further applications that could be explored in future work.
|
2 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
3 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
Page generated in 0.635 seconds