Global ETD Search

Return to search

Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation

Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions represent a principled approach to modeling translation, but existing transduction classes are either not expressive enough to capture structural regularities between natural languages or too complex to support efficient statistical induction on a large scale. A common approach is to severely prune search over a relatively unrestricted space of transduction grammars. These restrictions are often applied at different stages in a pipeline, with the obvious drawback of committing to irrevocable decisions that should not have been made. In this thesis we will instead restrict the space of transduction grammars to a space that is less expressive, but can be efficiently searched. First, the class of linear transductions is defined and characterized. They are generated by linear transduction grammars, which represent the natural bilingual case of linear grammars, as well as the natural linear case of inversion transduction grammars (and higher order syntax-directed transduction grammars). They are recognized by zipper finite-state transducers, which are equivalent to finite-state automata with four tapes. By allowing this extra dimensionality, linear transductions can represent alignments that finite-state transductions cannot, and by keeping the mechanism free of auxiliary storage, they become much more efficient than inversion transductions. Secondly, we present an algorithm for parsing with linear transduction grammars that allows pruning. The pruning scheme imposes no restrictions a priori, but guides the search to potentially interesting parts of the search space in an informed and dynamic way. Being able to parse efficiently allows learning of stochastic linear transduction grammars through expectation maximization. All the above work would be for naught if linear transductions were too poor a reflection of the actual transduction between natural languages. We test this empirically by building systems based on the alignments imposed by the learned grammars. The conclusion is that stochastic linear inversion transduction grammars learned from observed data stand up well to the state of the art.

linear transduction

linear transduction grammar

inversion transduction

zipper finite-state automaton

zipper finite-state transducer

formal language theory

formal transduction theory

translation

automatic translation

machine translation

statistical machine translation

Computational linguistics

Datorlingvistik

Language technology

Språkteknologi

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-135704
Date	January 2011
Creators	Saers, Markus
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi, Uppsala : Acta Universitatis Upsaliensis
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Doctoral thesis, monograph, info:eu-repo/semantics/doctoralThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Studia Linguistica Upsaliensia, 1652-1366 ; 9

Page generated in 0.0027 seconds

Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation

Description

Links & Downloads

Tags

Additional Fields