Return to search

Automatic compilation of bilingual terminologies from comparable corpora

Bilingual terminological resources play a pivotal role in human and machine translation of technical text. Owing to the immense volume of newly produced terminology in the biomedical domain, existing resources suffer from low coverage and they are only available for a limited number of languages. The need for term alignment methods that accurately identify translations of terms, emerges. In this work, we focus on bilingual terminology induction from freely available comparable corpora, i.e. thematically related documents in two or more languages. We investigate different sources of information that determine translation equivalence, including: (a) the internal structure of terms (compositional clue), (b) the surrounding lexical context (contextual clue) and (c) the topic distribution of terms (topical clue). We present four novel compositional alignment methods and we introduce several extensions over existing compositional, context-based and topic-based approaches. Furthermore, we combine the three translation clues in a single term alignment model and we show substantial improvements over the individual translation signals when considered in isolation. We examine the performance of the proposed term alignment methods on closely related (English-French, English-Spanish) language pairs, on a more distant, low-resource language pair (English-Greek) and on an unrelated (English-Japanese) language pair. As an application, we integrate automatically compiled bilingual terminologies with Statistical Machine Translation systems to more accurately translate unknown terms. Results show that an up-to-date bilingual dictionary of terms improves the translation performance of SMT.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:634907
Date January 2015
CreatorsKontonatsios, Georgios Nikolaos
PublisherUniversity of Manchester
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.research.manchester.ac.uk/portal/en/theses/automatic-compilation-of-bilingual-terminologies-from-comparable-corpora(7d4ea62b-74ad-4dc9-8511-cfb14d43ae13).html

Page generated in 0.0019 seconds