Global ETD Search

Return to search

The enhancement of machine translation for low-density languages using Web-gathered parallel texts.

The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.

low-density languages

Machine translating.

Identifer	oai:union.ndltd.org:unt.edu/info:ark/67531/metadc5140
Date	12 1900
Creators	Mohler, Michael Augustine Gaylord
Contributors	Mihalcea, Rada, 1974-, Tarau, Paul, Chen, Jiangping
Publisher	University of North Texas
Source Sets	University of North Texas
Language	English
Detected Language	English
Type	Thesis or Dissertation
Format	Text
Rights	Public, Copyright, Mohler, Michael Augustine Gaylord, Copyright is held by the author, unless otherwise noted. All rights reserved.

Page generated in 0.1921 seconds

The enhancement of machine translation for low-density languages using Web-gathered parallel texts.

Description

Links & Downloads

Tags

Additional Fields