Global ETD Search

31	Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation Saers, Markus January 2011 (has links) Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions represent a principled approach to modeling translation, but existing transduction classes are either not expressive enough to capture structural regularities between natural languages or too complex to support efficient statistical induction on a large scale. A common approach is to severely prune search over a relatively unrestricted space of transduction grammars. These restrictions are often applied at different stages in a pipeline, with the obvious drawback of committing to irrevocable decisions that should not have been made. In this thesis we will instead restrict the space of transduction grammars to a space that is less expressive, but can be efficiently searched. First, the class of linear transductions is defined and characterized. They are generated by linear transduction grammars, which represent the natural bilingual case of linear grammars, as well as the natural linear case of inversion transduction grammars (and higher order syntax-directed transduction grammars). They are recognized by zipper finite-state transducers, which are equivalent to finite-state automata with four tapes. By allowing this extra dimensionality, linear transductions can represent alignments that finite-state transductions cannot, and by keeping the mechanism free of auxiliary storage, they become much more efficient than inversion transductions. Secondly, we present an algorithm for parsing with linear transduction grammars that allows pruning. The pruning scheme imposes no restrictions a priori, but guides the search to potentially interesting parts of the search space in an informed and dynamic way. Being able to parse efficiently allows learning of stochastic linear transduction grammars through expectation maximization. All the above work would be for naught if linear transductions were too poor a reflection of the actual transduction between natural languages. We test this empirically by building systems based on the alignments imposed by the learned grammars. The conclusion is that stochastic linear inversion transduction grammars learned from observed data stand up well to the state of the art. linear transduction linear transduction grammar inversion transduction zipper finite-state automaton zipper finite-state transducer formal language theory formal transduction theory translation automatic translation machine translation statistical machine translation Computational linguistics Datorlingvistik Language technology Språkteknologi
32	Extraction of word senses from bilingual resources using graph-based semantic mirroring / Extraktion av ordbetydelser från tvåspråkiga resurser med grafbaserad semantisk spegling Lilliehöök, Hampus January 2013 (has links) In this thesis we retrieve semantic information that exists implicitly in bilingual data. We gather input data by repeatedly applying the semantic mirroring procedure. The data is then represented by vectors in a large vector space. A resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the result manually, using dictionaries, and against WordNet, and discuss prospects and applications of this method. / I det här arbetet utvinner vi semantisk information som existerar implicit i tvåspråkig data. Vi samlar indata genom att upprepa proceduren semantisk spegling. Datan representeras som vektorer i en stor vektorrymd. Vi bygger sedan en resurs med synonymkluster genom att applicera K-means-algoritmen på vektorerna. Vi granskar resultatet för hand med hjälp av ordböcker, och mot WordNet, och diskuterar möjligheter och tillämpningar för metoden. computational linguistics natural language processing data mining word sense discrimination semantic mirroring vector space modeling cluster analysis datorlingvistik språkteknologi data mining semantisk spegling ordbetydelser vektorrymdsmodeller

Search results

Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation

Extraction of word senses from bilingual resources using graph-based semantic mirroring / Extraktion av ordbetydelser från tvåspråkiga resurser med grafbaserad semantisk spegling