Return to search

LOCATING AND REDUCING TRANSLATION DIFFICULTY

The challenge of translation varies from one sentence to another, or even between phrases of a sentence. We investigate whether variations in difficulty can be located automatically for Statistical Machine Translation (SMT). Furthermore, we hypothesize that customization of a SMT system based on difficulty information, improves the translation quality.
We assume a binary categorization for phrases: easy vs. difficult. Our focus is on the Difficult to Translate Phrases (DTPs). Our experiments show that for a sentence, improving the translation of the DTP improves the translation of the surrounding non-difficult phrases too. To locate the most difficult phrase of each sentence, we use machine learning and construct a difficulty classifier. To improve the translation of DTPs, we introduce customization methods for three components of the SMT system: I. language model; II. translation model; III. decoding weights. With each method, we construct a new component that is dedicated for the translation of difficult phrases. Our experiments on Arabic-to-English translation show that DTP-specific system customization is mostly successful.
Overall, we demonstrate that translation difficulty is an important source of information for machine translation and can be used to enhance its performance.

Identiferoai:union.ndltd.org:PITT/oai:PITTETD:etd-07262010-165608
Date30 September 2010
CreatorsMohit, Behrang
ContributorsDaqing He, Alon Lavie, Rebecca Hwa, Janyce Wiebe
PublisherUniversity of Pittsburgh
Source SetsUniversity of Pittsburgh
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.library.pitt.edu/ETD/available/etd-07262010-165608/
Rightsunrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to University of Pittsburgh or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.012 seconds