Return to search

Phoneme-based statistical transliteration of foreign names for OOV problem.

Gao Wei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 79-82). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Bibliographic Notes --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- What is Transliteration? --- p.1 / Chapter 1.2 --- Existing Problems --- p.2 / Chapter 1.3 --- Objectives --- p.4 / Chapter 1.4 --- Outline --- p.4 / Chapter 2 --- Background --- p.6 / Chapter 2.1 --- Source-channel Model --- p.6 / Chapter 2.2 --- Transliteration for English-Chinese --- p.8 / Chapter 2.2.1 --- Rule-based Approach --- p.8 / Chapter 2.2.2 --- Similarity-based Framework --- p.8 / Chapter 2.2.3 --- Direct Semi-Statistical Approach --- p.9 / Chapter 2.2.4 --- Source-channel-based Approach --- p.11 / Chapter 2.3 --- Chapter Summary --- p.14 / Chapter 3 --- Transliteration Baseline --- p.15 / Chapter 3.1 --- Transliteration Using IBM SMT --- p.15 / Chapter 3.1.1 --- Introduction --- p.15 / Chapter 3.1.2 --- GIZA++ for Transliteration Modeling --- p.16 / Chapter 3.1.3 --- CMU-Cambridge Toolkits for Language Modeling --- p.21 / Chapter 3.1.4 --- Re Write Decoder for Decoding --- p.21 / Chapter 3.2 --- Limitations of IBM SMT --- p.22 / Chapter 3.3 --- Experiments Using IBM SMT --- p.25 / Chapter 3.3.1 --- Data Preparation --- p.25 / Chapter 3.3.2 --- Performance Measurement --- p.27 / Chapter 3.3.3 --- Experimental Results --- p.27 / Chapter 3.4 --- Chapter Summary --- p.28 / Chapter 4 --- Direct Transliteration Modeling --- p.29 / Chapter 4.1 --- Soundness of the Direct Model一Direct-1 --- p.30 / Chapter 4.2 --- Alignment of Phoneme Chunks --- p.31 / Chapter 4.3 --- Transliteration Model Training --- p.33 / Chapter 4.3.1 --- EM Training for Symbol-mappings --- p.33 / Chapter 4.3.2 --- WFST for Phonetic Transition --- p.36 / Chapter 4.3.3 --- Issues for Incorrect Syllables --- p.36 / Chapter 4.4 --- Language Model Training --- p.36 / Chapter 4.5 --- Search Algorithm --- p.39 / Chapter 4.6 --- Experimental Results --- p.41 / Chapter 4.6.1 --- Experiment I: C.A. Distribution --- p.41 / Chapter 4.6.2 --- Experiment II: Top-n Accuracy --- p.41 / Chapter 4.6.3 --- Experiment III: Comparisons with the Baseline --- p.43 / Chapter 4.6.4 --- Experiment IV: Influence of m Candidates --- p.43 / Chapter 4.7 --- Discussions --- p.43 / Chapter 4.8 --- Chapter Summary --- p.46 / Chapter 5 --- Improving Direct Transliteration --- p.47 / Chapter 5.1 --- Improved Direct Model´ؤDirect-2 --- p.47 / Chapter 5.1.1 --- Enlightenment from Source-Channel --- p.47 / Chapter 5.1.2 --- Using Contextual Features --- p.48 / Chapter 5.1.3 --- Estimation Based on MaxEnt --- p.49 / Chapter 5.1.4 --- Features for Transliteration --- p.51 / Chapter 5.2 --- Direct-2 Model Training --- p.53 / Chapter 5.2.1 --- Procedure and Results --- p.53 / Chapter 5.2.2 --- Discussions --- p.53 / Chapter 5.3 --- Refining the Model Direct-2 --- p.55 / Chapter 5.3.1 --- Refinement Solutions --- p.55 / Chapter 5.3.2 --- Direct-2R Model Training --- p.56 / Chapter 5.4 --- Evaluation --- p.57 / Chapter 5.4.1 --- Search Algorithm --- p.57 / Chapter 5.4.2 --- Direct Transliteration Models vs. Baseline --- p.59 / Chapter 5.4.3 --- Direct-2 vs. Direct-2R --- p.63 / Chapter 5.4.4 --- Experiments on Direct-2R --- p.65 / Chapter 5.5 --- Chapter Summary --- p.71 / Chapter 6 --- Conclusions --- p.72 / Chapter 6.1 --- Thesis Summary --- p.72 / Chapter 6.2 --- Cross Language Applications --- p.73 / Chapter 6.3 --- Future Work and Directions --- p.74 / Chapter A --- IPA-ARPABET Symbol Mapping Table --- p.77 / Bibliography --- p.82

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_324956
Date January 2004
ContributorsGao, Wei., Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, xii, 82 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0022 seconds