A data-driven computational approach is adopted to reveal significant pronunciation variations in Cantonese-English code-mixing speech. The findings are successfully applied to constructing a more relevant bilingual pronunciation dictionary and for selecting effective training materials for code-mixing ASR. For acoustic modeling, it is shown that cross-lingual acoustic models are more appropriate than language-dependent models. Various cross-lingual inventories are derived based on different combination schemes and similarity measurements. We have shown that the proposed data-driven approach based on K-L divergence and phonetic confusion matrix outperforms the IPA-based approach using merely phonetic knowledge. It is also found that initials and finals are more appropriate to be used as the basic Cantonese units than phonemes in code-mixing speech recognition applications. A text database with more than 9 million characters is compiled for language modeling of code-mixing ASR. Classbased language models with automatic clustering classes have been proven inefficient for code-mixing speech recognition. A semantics-based n-gram mapping approach is proposed to increase the counts of code-mixing n-gram at language boundaries. The language model perplexity and recognition performance has been significantly improved with the proposed semantics-based language models. The proposed code-mixing speech recognition system achieves 75.0% overall accuracy for Cantonese-English code-mixing speech, while the accuracy for Cantonese characters is 76.1% and accuracy for English lexicons is 65.5%. It also attains a reasonable character accuracy of 75.3% for monolingual Cantonese speech. / Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two languages in a spoken utterance. This thesis addresses the problem of the automatic recognition of Cantonese-English code-mixing speech, which is widely used in Hong Kong. / Cross-lingual speaker adaptation has also been investigated in the thesis. Speaker independent (SI) model mapping between Cantonese and English is established at different levels of acoustic units, viz phones, states, and Gaussian mixture components. A novel approach for cross-lingual speaker adaptation via Gaussian component mapping is proposed and has been proved to be effective in most speech recognition tasks. / This study starts with the investigation of the linguistic properties of Cantonese-English code-mixing, which is based on a large number of real code-mixing text corpora collected from the internet and other sources. The effects of language mixing for the automatic recognition of Cantonese-English codemixing utterances are analyzed in a systematic way. The problem of pronunciation dictionary, acoustic modeling and language modeling are investigated. Subsequently, a large-vocabulary code-mixing speech recognition system is developed and implemented. / While automatic speech recognition (ASR) of either Cantonese or English alone has achieved a great degree of success, recognition of Cantonese-English code-mixing speech is not as trivial. Unknown language boundary, accents in code-switched English words, phonetic and phonological differences between Cantonese and English, no regulated grammatical structure, and lack of speech and text data make the ASR of code-mixing utterances much more than a simple integration of two monolingual speech recognition systems. On the other hand, we have little understanding of this highly dynamic language phenomenon. Unlike in monolingual speech recognition research, there are very few linguistic studies that can be referred to. / Cao, Houwei. / Adviser: P.C. Ching. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 129-140). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_344823 |
Date | January 2011 |
Contributors | Cao, Houwei., Chinese University of Hong Kong Graduate School. Division of Electronic Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, theses |
Format | electronic resource, microform, microfiche, 1 online resource (xv, 140 leaves : ill. (some col.)) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.002 seconds