Global ETD Search

Return to search

Bayesian Models for Multilingual Word Alignment

In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology. In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world. Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-115541
Date	January 2015
Creators	Östling, Robert
Publisher	Stockholms universitet, Institutionen för lingvistik, Stockholm : Department of Linguistics, Stockholm University
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Doctoral thesis, monograph, info:eu-repo/semantics/doctoralThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0138 seconds

Bayesian Models for Multilingual Word Alignment

Description

Links & Downloads

Tags

Additional Fields