Return to search

Exploiting Linguistic and Statistical Knowledge in a Text Alignment System

In machine translation, the alignment of corpora has evolved into a mature research area, aimed at providing training data for statistical or example-based machine translation systems. Moreover, the alignment information can be used for a variety of other purposes, including lexicography and the induction of tools for natural language processing. The alignment techniques used for these purposes fall roughly in two separate classes: sentence alignment approaches that often combine statistical and linguistic information, and word alignment models that are dominated by the statistical machine translation paradigm. Alignment approaches that use linguistic knowledge provided by corpus annotation are rare, as are as non-statistical word alignment strategies. Furthermore, parallel corpora are typically not aligned at all text levels simultaneously. Rather, a corpus is first sentence aligned, and in a subsequent step, the alignment information is refined to go below the sentence level. In this thesis, the distinction between the two alignment classes is withdrawn. Rather, a system is introduced that can simultaneously align at the paragraph, sentence, word, and phrase level. Furthermore, linguistic as well as statistical information can be combined. This combination of alignment cues from different knowledge sources, as well as the combination of the sentence and word alignment tasks, is made possible by the development of a modular alignment platform. Its main features are that it supports different kinds of linguistic corpus annotation, and furthermore aligns a corpus hierarchically, such that sentence and word alignments are cohesive. Alignment cues are not used within a global alignment model. Rather, different sub-models can be implemented and allowed to interact. Most of the alignment modules of the system have been implemented using empirical corpus studies, aimed at showing how the most common types of corpus annotation can be exploited for the alignment task.

Identiferoai:union.ndltd.org:uni-osnabrueck.de/oai:repositorium.ub.uni-osnabrueck.de:urn:nbn:de:gbv:700-2009022517
Date20 February 2009
CreatorsSchrader, Bettina
ContributorsProf. Dr. Peter Bosch, Dr. habil. Helmar Gust, Prof. Dr. Stefan Evert, Prof. Dr. Peter Bosch, Dr. habil. Helmar Gust, Prof. Dr. Stefan Evert, Prof. Dr. Martin Volk
Source SetsUniversität Osnabrück
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:doctoralThesis
Formatapplication/gzip, application/pdf
Rightshttp://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0019 seconds