1 |
Word Reordering for Statistical Machine Translation via Modeling Structural Differences between Languages / 統計的機械翻訳のための言語構造の違いのモデル化による語順推定Goto, Isao 23 May 2014 (has links)
2015-05-27に本文を差替 / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18481号 / 情博第532号 / 新制||情||94(附属図書館) / 31359 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 田中 克己, 教授 河原 達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
2 |
Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine TranslationSaers, Markus January 2011 (has links)
Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions represent a principled approach to modeling translation, but existing transduction classes are either not expressive enough to capture structural regularities between natural languages or too complex to support efficient statistical induction on a large scale. A common approach is to severely prune search over a relatively unrestricted space of transduction grammars. These restrictions are often applied at different stages in a pipeline, with the obvious drawback of committing to irrevocable decisions that should not have been made. In this thesis we will instead restrict the space of transduction grammars to a space that is less expressive, but can be efficiently searched. First, the class of linear transductions is defined and characterized. They are generated by linear transduction grammars, which represent the natural bilingual case of linear grammars, as well as the natural linear case of inversion transduction grammars (and higher order syntax-directed transduction grammars). They are recognized by zipper finite-state transducers, which are equivalent to finite-state automata with four tapes. By allowing this extra dimensionality, linear transductions can represent alignments that finite-state transductions cannot, and by keeping the mechanism free of auxiliary storage, they become much more efficient than inversion transductions. Secondly, we present an algorithm for parsing with linear transduction grammars that allows pruning. The pruning scheme imposes no restrictions a priori, but guides the search to potentially interesting parts of the search space in an informed and dynamic way. Being able to parse efficiently allows learning of stochastic linear transduction grammars through expectation maximization. All the above work would be for naught if linear transductions were too poor a reflection of the actual transduction between natural languages. We test this empirically by building systems based on the alignments imposed by the learned grammars. The conclusion is that stochastic linear inversion transduction grammars learned from observed data stand up well to the state of the art.
|
Page generated in 0.1325 seconds