Return to search

Lingvistické otázky ve strojovém překladu mezi češtinou a ruštinou / Linguistic Issues in Machine Translation between Czech and Russian

In this thesis we analyze machine translation between Czech and Russian languages from the perspective of a linguist. We work with two types of Machine Translation systems - rule-based (TectoMT) and statistical (Moses). We experiment with different setups of these two systems in order to achieve the best possible quality. One of the questions we address in our work is whether relatedness of the discussed languages has some impact on machine translation. We explore the output of our two experimental systems and two commercial systems: PC Translator and Google Translate. We make a linguistically-motivated classification of errors for the language pair and describe each type of error in detail, analyzing whether it occurred due to some difference between Czech and Russian or is it caused by the system architecture. We then compare the usage of some specific linguistic phenomena in the two languages and state how the individual systems cope with mismatches. For some errors, we suggest ways to improve them and in several cases we implement those suggestions. In particular, we focus on one specific error type - surface valency. We research the mismatches between Czech and Russian valency, extract a lexicon of surface valency frames, incorporate the lexicon into the TectoMT translation pipeline and present...

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:349327
Date January 2015
CreatorsKlyueva, Natalia
ContributorsKuboň, Vladislav, Panevová, Jarmila, Strossa, Petr
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0128 seconds