Global ETD Search

Return to search

On improving natural language processing through phrase-based and one-to-one syntactic algorithms

Master of Science / Department of Computing and Information Sciences / William H. Hsu / Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility.
The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081.
The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score.
The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed.

http://hdl.handle.net/2097/1096

Artificial Intelligence

Natural language processing

Japanese

Machine translation

Contextual syntax

Phrase-based translation

Artificial Intelligence (0800)

Computer Science (0984)

Language, Modern (0291)

Identifer	oai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/1096
Date	January 1900
Creators	Meyer, Christopher Henry
Publisher	Kansas State University
Source Sets	K-State Research Exchange
Language	en_US
Detected Language	English
Type	Thesis

Page generated in 0.0021 seconds

On improving natural language processing through phrase-based and one-to-one syntactic algorithms

Description

Links & Downloads

Tags

Additional Fields