181 |
Use of tone information in Cantonese LVCSR based on generalized character posterior probability decoding. / CUHK electronic theses & dissertations collectionJanuary 2005 (has links)
Automatic recognition of Cantonese tones has long been regarded as a difficult task. Cantonese has one of the most complicated tone systems among all languages in the world. This thesis presents a novel approach of modeling Cantonese tones. We propose the use of supra-tone models. Each supra-tone unit covers a number of syllables in succession. The supra-tone model characterizes not only the tone contours of individual syllables but also the transitions among them. By including multiple tone contours in one modeling unit, the relative heights of the tones are captured explicitly. This is especially important for the discrimination among the level tones of Cantonese. / The decoding in conventional LVCSR systems aims at finding the sentence hypothesis, i.e. the string of words, which has the maximum a posterior (MAP) probability in comparison with other hypotheses. However, in most applications, the recognition performance is measured in terms of word error rate (or word accuracy). In Chinese languages, given that "word" is a rather ambiguous concept, speech recognition performance is usually measured in terms of the character error rate. In this thesis, we develop a decoding algorithm that can minimize the character error rate. The algorithm is applied to a reduced search space, e.g. a word graph or the N-best sentence list, which results from the 1st pass of search, and the generalized character posterior probability (GCPP) is maximized. (Abstract shortened by UMI.) / This thesis addresses two major problems of the existing large vocabulary continuous speech recognition (LVCSR) technology: (1) inadequate exploitation of alternative linguistic and acoustic information; and (2) the mismatch between the decoding (recognition) criterion and the performance evaluation. The study is focused on Cantonese, one of the major Chinese dialects, which is also monosyllabic and tonal. Tone is somewhat indispensable for lexical access and disambiguation of homonyms in Cantonese. However, tone information into Cantonese LVCSR requires effective tone recognition as well as a seamless integration algorithm. / Qian Yao. / "July 2005." / Adviser: Tan Lee. / Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 4009. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (p. 100-110). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307.
|
182 |
GMM-based speaker recognition for mobile embedded systems. / CUHK electronic theses & dissertations collectionJanuary 2004 (has links)
Leung Cheung-chi. / "July 2004." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (p. 77-81). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
|
183 |
Multi-transputer based isolated word speech recognition system.January 1996 (has links)
by Francis Cho-yiu Chik. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 129-135). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic speech recognition and its applications --- p.1 / Chapter 1.1.1 --- Artificial Neural Network (ANN) approach --- p.3 / Chapter 1.2 --- Motivation --- p.5 / Chapter 1.3 --- Background --- p.6 / Chapter 1.3.1 --- Speech recognition --- p.6 / Chapter 1.3.2 --- Parallel processing --- p.7 / Chapter 1.3.3 --- Parallel architectures --- p.10 / Chapter 1.3.4 --- Transputer --- p.12 / Chapter 1.4 --- Thesis outline --- p.13 / Chapter 2 --- Speech Signal Pre-processing --- p.14 / Chapter 2.1 --- Determine useful signal --- p.14 / Chapter 2.1.1 --- End point detection using energy --- p.15 / Chapter 2.1.2 --- End point detection enhancement using zero crossing rate --- p.18 / Chapter 2.2 --- Pre-emphasis filter --- p.19 / Chapter 2.3 --- Feature extraction --- p.20 / Chapter 2.3.1 --- Filter-bank spectrum analysis model --- p.22 / Chapter 2.3.2 --- Linear Predictive Coding (LPC) coefficients --- p.25 / Chapter 2.3.3 --- Cepstral coefficients --- p.27 / Chapter 2.3.4 --- Zero crossing rate and energy --- p.27 / Chapter 2.3.5 --- Pitch (fundamental frequency) detection --- p.28 / Chapter 2.4 --- Discussions --- p.30 / Chapter 3 --- Speech Recognition Methods --- p.32 / Chapter 3.1 --- Template matching using Dynamic Time Warping (DTW) --- p.32 / Chapter 3.2 --- Hidden Markov Model (HMM) --- p.37 / Chapter 3.2.1 --- Vector Quantization (VQ) --- p.38 / Chapter 3.2.2 --- Description of a discrete HMM --- p.41 / Chapter 3.2.3 --- Probability evaluation --- p.42 / Chapter 3.2.4 --- Estimation technique for model parameters --- p.46 / Chapter 3.2.5 --- State sequence for the observation sequence --- p.48 / Chapter 3.3 --- 2-dimensional Hidden Markov Model (2dHMM) --- p.49 / Chapter 3.3.1 --- Calculation for a 2dHMM --- p.50 / Chapter 3.4 --- Discussions --- p.56 / Chapter 4 --- Implementation --- p.59 / Chapter 4.1 --- Transputer based multiprocessor system --- p.59 / Chapter 4.1.1 --- Transputer Development System (TDS) --- p.60 / Chapter 4.1.2 --- System architecture --- p.61 / Chapter 4.1.3 --- Transtech TMB16 mother board --- p.62 / Chapter 4.1.4 --- Farming technique --- p.64 / Chapter 4.2 --- Farming technique on extracting spectral amplitude feature --- p.68 / Chapter 4.3 --- Feature extraction for LPC --- p.73 / Chapter 4.4 --- DTW based recognition --- p.77 / Chapter 4.4.1 --- Feature extraction --- p.77 / Chapter 4.4.2 --- Training and matching --- p.78 / Chapter 4.5 --- HMM based recognition --- p.80 / Chapter 4.5.1 --- Feature extraction --- p.80 / Chapter 4.5.2 --- Model training and matching --- p.81 / Chapter 4.6 --- 2dHMM based recognition --- p.83 / Chapter 4.6.1 --- Feature extraction --- p.83 / Chapter 4.6.2 --- Training --- p.83 / Chapter 4.6.3 --- Recognition --- p.87 / Chapter 4.7 --- Training convergence in HMM and 2dHMM --- p.88 / Chapter 4.8 --- Discussions --- p.91 / Chapter 5 --- Experimental Results --- p.92 / Chapter 5.1 --- "Comparison of DTW, HMM and 2dHMM" --- p.93 / Chapter 5.2 --- Comparison between HMM and 2dHMM --- p.98 / Chapter 5.2.1 --- Recognition test on 20 English words --- p.98 / Chapter 5.2.2 --- Recognition test on 10 Cantonese syllables --- p.102 / Chapter 5.3 --- Recognition test on 80 Cantonese syllables --- p.113 / Chapter 5.4 --- Speed matching --- p.118 / Chapter 5.5 --- Computational performance --- p.119 / Chapter 5.5.1 --- Training performance --- p.119 / Chapter 5.5.2 --- Recognition performance --- p.120 / Chapter 6 --- Discussions and Conclusions --- p.126 / Bibliography --- p.129 / Chapter A --- An ANN Model for Speech Recognition --- p.136 / Chapter B --- A Speech Signal Represented in Fequency Domain (Spectrogram) --- p.138 / Chapter C --- Dynamic Programming --- p.144 / Chapter D --- Markov Process --- p.145 / Chapter E --- Maximum Likelihood (ML) --- p.146 / Chapter F --- Multiple Training --- p.149 / Chapter F.1 --- HMM --- p.150 / Chapter F.2 --- 2dHMM --- p.150 / Chapter G --- IMS T800 Transputer --- p.152 / Chapter G.1 --- IMS T800 architecture --- p.152 / Chapter G.2 --- Instruction encoding --- p.153 / Chapter G.3 --- Floating point instructions --- p.155 / Chapter G.4 --- Optimizing use of the stack --- p.157 / Chapter G.5 --- Concurrent operation of FPU and CPU --- p.158
|
184 |
A frequency-based BSS technique for speech source separation.January 2003 (has links)
Ngan Lai Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 95-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Blind Signal Separation (BSS) Methods --- p.4 / Chapter 1.2 --- Objectives of the Thesis --- p.6 / Chapter 1.3 --- Thesis Outline --- p.8 / Chapter 2 --- Blind Adaptive Frequency-Shift (BA-FRESH) Filter --- p.9 / Chapter 2.1 --- Cyclostationarity Properties --- p.10 / Chapter 2.2 --- Frequency-Shift (FRESH) Filter --- p.11 / Chapter 2.3 --- Blind Adaptive FRESH Filter --- p.12 / Chapter 2.4 --- Reduced-Rank BA-FRESH Filter --- p.14 / Chapter 2.4.1 --- CSP Method --- p.14 / Chapter 2.4.2 --- PCA Method --- p.14 / Chapter 2.4.3 --- Appropriate Choice of Rank --- p.14 / Chapter 2.5 --- Signal Extraction of Spectrally Overlapped Signals --- p.16 / Chapter 2.5.1 --- Simulation 1: A Fixed Rank --- p.17 / Chapter 2.5.2 --- Simulation 2: A Variable Rank --- p.18 / Chapter 2.6 --- Signal Separation of Speech Signals --- p.20 / Chapter 2.7 --- Chapter Summary --- p.22 / Chapter 3 --- Reverberant Environment --- p.23 / Chapter 3.1 --- Small Room Acoustics Model --- p.23 / Chapter 3.2 --- Effects of Reverberation to Speech Recognition --- p.27 / Chapter 3.2.1 --- Short Impulse Response --- p.27 / Chapter 3.2.2 --- Small Room Impulse Response Modelled by Image Method --- p.32 / Chapter 3.3 --- Chapter Summary --- p.34 / Chapter 4 --- Information Theoretic Approach for Signal Separation --- p.35 / Chapter 4.1 --- Independent Component Analysis (ICA) --- p.35 / Chapter 4.1.1 --- Kullback-Leibler (K-L) Divergence --- p.37 / Chapter 4.2 --- Information Maximization (Infomax) --- p.39 / Chapter 4.2.1 --- Stochastic Gradient Descent and Stability Problem --- p.41 / Chapter 4.2.2 --- Infomax and ICA --- p.41 / Chapter 4.2.3 --- Infomax and Maximum Likelihood --- p.42 / Chapter 4.3 --- Signal Separation by Infomax --- p.43 / Chapter 4.4 --- Chapter Summary --- p.45 / Chapter 5 --- Blind Signal Separation (BSS) in Frequency Domain --- p.47 / Chapter 5.1 --- Convolutive Mixing System --- p.48 / Chapter 5.2 --- Infomax in Frequency Domain --- p.52 / Chapter 5.3 --- Adaptation Algorithms --- p.54 / Chapter 5.3.1 --- Standard Gradient Method --- p.54 / Chapter 5.3.2 --- Natural Gradient Method --- p.55 / Chapter 5.3.3 --- Convergence Performance --- p.56 / Chapter 5.4 --- Subband Adaptation --- p.57 / Chapter 5.5 --- Energy Weighting --- p.59 / Chapter 5.6 --- The Permutation Problem --- p.61 / Chapter 5.7 --- Performance Evaluation --- p.63 / Chapter 5.7.1 --- De-reverberation Performance Factor --- p.63 / Chapter 5.7.2 --- De-Noise Performance Factor --- p.63 / Chapter 5.7.3 --- Spectral Signal-to-noise Ratio (SNR) --- p.65 / Chapter 5.8 --- Chapter Summary --- p.65 / Chapter 6 --- Simulation Results and Performance Analysis --- p.67 / Chapter 6.1 --- Small Room Acoustics Modelled by Image Method --- p.67 / Chapter 6.2 --- Signal Sources --- p.68 / Chapter 6.2.1 --- Cantonese Speech --- p.69 / Chapter 6.2.2 --- Noise --- p.69 / Chapter 6.3 --- De-Noise and De-Reverberation Performance Analysis --- p.69 / Chapter 6.3.1 --- Speech and White Noise --- p.73 / Chapter 6.3.2 --- Speech and Voice Babble Noise --- p.76 / Chapter 6.3.3 --- Two Female Speeches --- p.79 / Chapter 6.4 --- Recognition Accuracy Performance Analysis --- p.83 / Chapter 6.4.1 --- Speech and White Noise --- p.83 / Chapter 6.4.2 --- Speech and Voice Babble Noise --- p.84 / Chapter 6.4.3 --- Two Cantonese Speeches --- p.85 / Chapter 6.5 --- Chapter Summary --- p.87 / Chapter 7 --- Conclusions and Suggestions for Future Research --- p.88 / Chapter 7.1 --- Conclusions --- p.88 / Chapter 7.2 --- Suggestions for Future Research --- p.91 / Appendices --- p.92 / A The Proof of Stability Conditions for Stochastic Gradient De- scent Algorithm (Ref. (4.15)) --- p.92 / Bibliography --- p.95
|
185 |
The use of multiple speech recognition hypotheses for natural language understanding.January 2003 (has links)
Wang Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 102-104). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.3 / Chapter 2 --- Background --- p.4 / Chapter 2.1 --- Speech Recognition --- p.4 / Chapter 2.2 --- Natural Language Understanding --- p.6 / Chapter 2.2.1 --- Rule-based Approach --- p.7 / Chapter 2.2.2 --- Corpus-based Approach --- p.7 / Chapter 2.3 --- Integration of Speech Recognition with NLU --- p.8 / Chapter 2.3.1 --- Word Graph --- p.9 / Chapter 2.3.2 --- N-best List --- p.9 / Chapter 2.4 --- The ATIS Domain --- p.10 / Chapter 2.5 --- Chapter Summary --- p.14 / Chapter 3 --- Generation of Speech Recognition Hypotheses --- p.15 / Chapter 3.1 --- Grammar Development for the OpenSpeech Recognizer --- p.16 / Chapter 3.2 --- Generation of Speech Recognition Hypotheses --- p.22 / Chapter 3.3 --- Evaluation of Speech Recognition Hypotheses --- p.24 / Chapter 3.3.1 --- Recognition Accuracy --- p.24 / Chapter 3.3.2 --- Concept Accuracy --- p.28 / Chapter 3.4 --- Results and Analysis --- p.33 / Chapter 3.5 --- Chapter Summary --- p.38 / Chapter 4 --- Belief Networks for NLU --- p.40 / Chapter 4.1 --- Problem Formulation --- p.40 / Chapter 4.2 --- The Original NLU Framework --- p.41 / Chapter 4.2.1 --- Semantic Tagging --- p.41 / Chapter 4.2.2 --- Concept Selection --- p.42 / Chapter 4.2.3 --- Bayesian Inference --- p.43 / Chapter 4.2.4 --- Thresholding --- p.44 / Chapter 4.2.5 --- Goal Identification --- p.45 / Chapter 4.3 --- Evaluation Method of Goal Identification Performance --- p.45 / Chapter 4.4 --- Baseline Result --- p.48 / Chapter 4.5 --- Chapter Summary --- p.50 / Chapter 5 --- The Effects of Recognition Errors on NLU --- p.51 / Chapter 5.1 --- Experiments --- p.51 / Chapter 5.1.1 --- Perfect Case´ؤThe Use of Transcripts --- p.53 / Chapter 5.1.2 --- Train on Recognition Hypotheses --- p.53 / Chapter 5.1.3 --- Test on Recognition Hypotheses --- p.55 / Chapter 5.1.4 --- Train and Test on Recognition Hypotheses --- p.56 / Chapter 5.2 --- Analysis of Results --- p.60 / Chapter 5.3 --- Chapter Summary --- p.67 / Chapter 6 --- The Use of Multiple Speech Recognition Hypotheses for NLU --- p.69 / Chapter 6.1 --- The Extended NLU Framework --- p.76 / Chapter 6.1.1 --- Semantic Tagging --- p.76 / Chapter 6.1.2 --- Recognition Confidence Score Normalization --- p.77 / Chapter 6.1.3 --- Concept Selection --- p.79 / Chapter 6.1.4 --- Bayesian Inference --- p.80 / Chapter 6.1.5 --- Combination with Confidence Scores --- p.81 / Chapter 6.1.6 --- Thresholding --- p.84 / Chapter 6.1.7 --- Goal Identification --- p.84 / Chapter 6.2 --- Experiments --- p.86 / Chapter 6.2.1 --- The Use of First Best Recognition Hypothesis --- p.86 / Chapter 6.2.2 --- Train on Multiple Recognition Hypotheses --- p.86 / Chapter 6.2.3 --- Test on Multiple Recognition Hypotheses --- p.87 / Chapter 6.2.4 --- Train and Test on Multiple Recognition Hypotheses --- p.88 / Chapter 6.3 --- Significance Testing --- p.90 / Chapter 6.4 --- Result Analysis --- p.91 / Chapter 6.5 --- Chapter Summary --- p.97 / Chapter 7 --- Conclusions and Future Work --- p.98 / Chapter 7.1 --- Conclusions --- p.98 / Chapter 7.2 --- Contribution --- p.99 / Chapter 7.3 --- Future Work --- p.100 / Bibliography --- p.102 / Chapter A --- Speech Recognition Hypotheses Distribution --- p.105 / Chapter B --- Recognition Errors in Three Kinds of Queries --- p.107 / Chapter C --- The Effects of Recognition Errors in N-Best list on NLU --- p.114 / Chapter D --- Training on Multiple Recognition Hypotheses --- p.117 / Chapter E --- Testing on Multiple Recognition Hypotheses --- p.132 / Chapter F --- Hand-designed Grammar For ATIS --- p.139
|
186 |
Text-independent bilingual speaker verification system.January 2003 (has links)
Ma Bin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 96-102). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Biometrics --- p.2 / Chapter 1.2 --- Speaker Verification --- p.3 / Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4 / Chapter 1.4 --- Text Dependency --- p.4 / Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5 / Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6 / Chapter 1.5 --- Language Dependency --- p.6 / Chapter 1.6 --- Normalization Techniques --- p.7 / Chapter 1.7 --- Objectives of the Thesis --- p.8 / Chapter 1.8 --- Thesis Organization --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Background Information --- p.11 / Chapter 2.1.1 --- Speech Signal Acquisition --- p.11 / Chapter 2.1.2 --- Speech Processing --- p.11 / Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13 / Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14 / Chapter 2.1.5 --- Feature Parameters --- p.15 / Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16 / Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18 / Chapter 2.1.5.3 --- Energy Measures --- p.20 / Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21 / Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22 / Chapter 2.2 --- Common Techniques --- p.24 / Chapter 2.2.1 --- Template Model Matching Methods --- p.25 / Chapter 2.2.2 --- Statistical Model Methods --- p.26 / Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27 / Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30 / Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31 / Chapter 2.2.2.4 --- The Advantages of GMM --- p.32 / Chapter 2.2.3 --- Likelihood Scoring --- p.32 / Chapter 2.2.4 --- General Approach to Decision Making --- p.35 / Chapter 2.2.5 --- Cohort Normalization --- p.35 / Chapter 2.2.5.1 --- Probability Score Normalization --- p.36 / Chapter 2.2.5.2 --- Cohort Selection --- p.37 / Chapter 2.3 --- Chapter Summary --- p.38 / Chapter 3 --- Experimental Corpora --- p.39 / Chapter 3.1 --- The YOHO Corpus --- p.39 / Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39 / Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40 / Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41 / Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42 / Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42 / Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44 / Chapter 3.3 --- Chapter Summary --- p.46 / Chapter 4 --- Text-Dependent Speaker Verification --- p.47 / Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48 / Chapter 4.2 --- Cohort Normalization Setup --- p.50 / Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53 / Chapter 4.3.1 --- Subword HMM Models --- p.53 / Chapter 4.3.2 --- Experimental Results --- p.55 / Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55 / Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58 / Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61 / Chapter 4.4.1 --- Experimental Setup --- p.61 / Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62 / Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64 / Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65 / Chapter 4.5 --- Comparison with Previous Systems --- p.67 / Chapter 4.6 --- Chapter Summary --- p.70 / Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71 / Chapter 5.1 --- Front-End Processing of the CUBS --- p.72 / Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73 / Chapter 5.3 --- Cohort Normalization --- p.74 / Chapter 5.4 --- Experimental Results and Analysis --- p.75 / Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78 / Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79 / Chapter 5.4.3 --- Language Dependency --- p.80 / Chapter 5.4.4 --- Language-Independency --- p.83 / Chapter 5.5 --- Chapter Summary --- p.88 / Chapter 6 --- Conclusions and Future Work --- p.90 / Chapter 6.1 --- Summary --- p.90 / Chapter 6.1.1 --- Feature Comparison --- p.91 / Chapter 6.1.2 --- HMM Modeling --- p.91 / Chapter 6.1.3 --- GMM Modeling --- p.91 / Chapter 6.1.4 --- Cohort Normalization --- p.92 / Chapter 6.1.5 --- Language Dependency --- p.92 / Chapter 6.2 --- Future Work --- p.93 / Chapter 6.2.1 --- Feature Parameters --- p.93 / Chapter 6.2.2 --- Model Quality --- p.93 / Chapter 6.2.2.1 --- Variance Flooring --- p.93 / Chapter 6.2.2.2 --- Silence Detection --- p.94 / Chapter 6.2.3 --- Conversational Speaker Verification --- p.95 / Bibliography --- p.102
|
187 |
Pronunciation modeling for Cantonese speech recognition.January 2003 (has links)
Kam Patgi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaf 103). / Abstracts in English and Chinese. / Chapter Chapter 1. --- Introduction --- p.1 / Chapter 1.1 --- Automatic Speech Recognition --- p.1 / Chapter 1.2 --- Pronunciation Modeling in ASR --- p.2 / Chapter 1.3 --- Obj ectives of the Thesis --- p.5 / Chapter 1.4 --- Thesis Outline --- p.5 / Reference --- p.7 / Chapter Chapter 2. --- The Cantonese Dialect --- p.9 / Chapter 2.1 --- Cantonese - A Typical Chinese Dialect --- p.10 / Chapter 2.1.1 --- Cantonese Phonology --- p.11 / Chapter 2.1.2 --- Cantonese Phonetics --- p.12 / Chapter 2.2 --- Pronunciation Variation in Cantonese --- p.13 / Chapter 2.2.1 --- Phone Change and Sound Change --- p.14 / Chapter 2.2.2 --- Notation for Different Sound Units --- p.16 / Chapter 2.3 --- Summary --- p.17 / Reference --- p.18 / Chapter Chapter 3. --- Large-Vocabulary Continuous Speech Recognition for Cantonese --- p.19 / Chapter 3.1 --- Feature Representation of the Speech Signal --- p.20 / Chapter 3.2 --- Probabilistic Framework of ASR --- p.20 / Chapter 3.3 --- Hidden Markov Model for Acoustic Modeling --- p.21 / Chapter 3.4 --- Pronunciation Lexicon --- p.25 / Chapter 3.5 --- Statistical Language Model --- p.25 / Chapter 3.6 --- Decoding --- p.26 / Chapter 3.7 --- The Baseline Cantonese LVCSR System --- p.26 / Chapter 3.7.1 --- System Architecture --- p.26 / Chapter 3.7.2 --- Speech Databases --- p.28 / Chapter 3.8 --- Summary --- p.29 / Reference --- p.30 / Chapter Chapter 4. --- Pronunciation Model --- p.32 / Chapter 4.1 --- Pronunciation Modeling at Different Levels --- p.33 / Chapter 4.2 --- Phone-level pronunciation model and its Application --- p.35 / Chapter 4.2.1 --- IF Confusion Matrix (CM) --- p.35 / Chapter 4.2.2 --- Decision Tree Pronunciation Model (DTPM) --- p.38 / Chapter 4.2.3 --- Refinement of Confusion Matrix --- p.41 / Chapter 4.3 --- Summary --- p.43 / References --- p.44 / Chapter Chapter 5. --- Pronunciation Modeling at Lexical Level --- p.45 / Chapter 5.1 --- Construction of PVD --- p.46 / Chapter 5.2 --- PVD Pruning by Word Unigram --- p.48 / Chapter 5.3 --- Recognition Experiments --- p.49 / Chapter 5.3.1 --- Experiment 1 ´ؤPronunciation Modeling in LVCSR --- p.49 / Chapter 5.3.2 --- Experiment 2 ´ؤ Pronunciation Modeling in Domain Specific task --- p.58 / Chapter 5.3.3 --- Experiment 3 ´ؤ PVD Pruning by Word Unigram --- p.62 / Chapter 5.4 --- Summary --- p.63 / Reference --- p.64 / Chapter Chapter 6. --- Pronunciation Modeling at Acoustic Model Level --- p.66 / Chapter 6.1 --- Hierarchy of HMM --- p.67 / Chapter 6.2 --- Sharing of Mixture Components --- p.68 / Chapter 6.3 --- Adaptation of Mixture Components --- p.70 / Chapter 6.4 --- Combination of Mixture Component Sharing and Adaptation --- p.74 / Chapter 6.5 --- Recognition Experiments --- p.78 / Chapter 6.6 --- Result Analysis --- p.80 / Chapter 6.6.1 --- Performance of Sharing Mixture Components --- p.81 / Chapter 6.6.2 --- Performance of Mixture Component Adaptation --- p.84 / Chapter 6.7 --- Summary --- p.85 / Reference --- p.87 / Chapter Chapter 7. --- Pronunciation Modeling at Decoding Level --- p.88 / Chapter 7.1 --- Search Process in Cantonese LVCSR --- p.88 / Chapter 7.2 --- Model-Level Search Space Expansion --- p.90 / Chapter 7.3 --- State-Level Output Probability Modification --- p.92 / Chapter 7.4 --- Recognition Experiments --- p.93 / Chapter 7.4.1 --- Experiment 1 ´ؤModel-Level Search Space Expansion --- p.93 / Chapter 7.4.2 --- Experiment 2 ´ؤ State-Level Output Probability Modification …… --- p.94 / Chapter 7.5 --- Summary --- p.96 / Reference --- p.97 / Chapter Chapter 8. --- Conclusions and Suggestions for Future Work --- p.98 / Chapter 8.1 --- Conclusions --- p.98 / Chapter 8.2 --- Suggestions for Future Work --- p.100 / Reference --- p.103 / Appendix I Base Syllable Table --- p.104 / Appendix II Cantonese Initials and Finals --- p.105 / Appendix III IF confusion matrix --- p.106 / Appendix IV Phonetic Question Set --- p.112 / Appendix V CDDT and PCDT --- p.114
|
188 |
Language modeling for speech recognition of spoken Cantonese.January 2009 (has links)
Yeung, Yu Ting. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 84-93). / Abstracts in English and Chinese. / Acknowledgement --- p.iii / Abstract --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Cantonese Speech Recognition --- p.3 / Chapter 1.2 --- Objectives --- p.4 / Chapter 1.3 --- Thesis Outline --- p.5 / Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition --- p.7 / Chapter 2.1 --- Problem Formulation --- p.7 / Chapter 2.2 --- Feature Extraction --- p.8 / Chapter 2.3 --- Acoustic Models --- p.9 / Chapter 2.4 --- Decoding --- p.10 / Chapter 2.5 --- Statistical Language Modeling --- p.12 / Chapter 2.5.1 --- N-gram Language Models --- p.12 / Chapter 2.5.2 --- N-gram Smoothing --- p.13 / Chapter 2.5.3 --- Complexity of Language Model --- p.15 / Chapter 2.5.4 --- Class-based Langauge Model --- p.16 / Chapter 2.5.5 --- Language Model Pruning --- p.17 / Chapter 2.6 --- Performance Evaluation --- p.18 / Chapter 3 --- The Cantonese Dialect --- p.19 / Chapter 3.1 --- Phonology of Cantonese --- p.19 / Chapter 3.2 --- Orthographic Representation of Cantonese --- p.22 / Chapter 3.3 --- Classification of Cantonese speech --- p.25 / Chapter 3.4 --- Cantonese-English Code-mixing --- p.27 / Chapter 4 --- Rule-based Translation Method --- p.29 / Chapter 4.1 --- Motivations --- p.29 / Chapter 4.2 --- Transformation-based Learning --- p.30 / Chapter 4.2.1 --- Algorithm Overview --- p.30 / Chapter 4.2.2 --- Learning of Translation Rules --- p.32 / Chapter 4.3 --- Performance Evaluation --- p.35 / Chapter 4.3.1 --- The Learnt Translation Rules --- p.35 / Chapter 4.3.2 --- Evaluation of the Rules --- p.37 / Chapter 4.3.3 --- Analysis of the Rules --- p.37 / Chapter 4.4 --- Preparation of Training Data for Language Modeling --- p.41 / Chapter 4.5 --- Discussion --- p.43 / Chapter 5 --- Language Modeling for Cantonese --- p.44 / Chapter 5.1 --- Training Data --- p.44 / Chapter 5.1.1 --- Text Corpora --- p.44 / Chapter 5.1.2 --- Preparation of Formal Cantonese Text Data --- p.45 / Chapter 5.2 --- Training of Language Models --- p.46 / Chapter 5.2.1 --- Language Models for Standard Chinese --- p.46 / Chapter 5.2.2 --- Language Models for Formal Cantonese --- p.46 / Chapter 5.2.3 --- Language models for Colloquial Cantonese --- p.47 / Chapter 5.3 --- Evaluation of Language Models --- p.48 / Chapter 5.3.1 --- Speech Corpora for Evaluation --- p.48 / Chapter 5.3.2 --- Perplexities of Formal Cantonese Language Models --- p.49 / Chapter 5.3.3 --- Perplexities of Colloquial Cantonese Language Models --- p.51 / Chapter 5.4 --- Speech Recognition Experiments --- p.53 / Chapter 5.4.1 --- Speech Corpora --- p.53 / Chapter 5.4.2 --- Experimental Setup --- p.54 / Chapter 5.4.3 --- Results on Formal Cantonese Models --- p.55 / Chapter 5.4.4 --- Results on Colloquial Cantonese Models --- p.56 / Chapter 5.5 --- Analysis of Results --- p.58 / Chapter 5.6 --- Discussion --- p.59 / Chapter 5.6.1 --- Cantonese Language Modeling --- p.59 / Chapter 5.6.2 --- Interpolated Language Models --- p.59 / Chapter 5.6.3 --- Class-based Language Models --- p.60 / Chapter 6 --- Towards Language Modeling of Code-mixing Speech --- p.61 / Chapter 6.1 --- Data Collection --- p.61 / Chapter 6.1.1 --- Data Collection --- p.62 / Chapter 6.1.2 --- Filtering of Collected Data --- p.63 / Chapter 6.1.3 --- Processing of Collected Data --- p.63 / Chapter 6.2 --- Clustering of Chinese and English Words --- p.64 / Chapter 6.3 --- Language Modeling for Code-mixing Speech --- p.64 / Chapter 6.3.1 --- Language Models from Collected Data --- p.64 / Chapter 6.3.2 --- Class-based Language Models --- p.66 / Chapter 6.3.3 --- Performance Evaluation of Code-mixing Language Models --- p.67 / Chapter 6.4 --- Speech Recognition Experiments with Code-mixing Language Models --- p.69 / Chapter 6.4.1 --- Experimental Setup --- p.69 / Chapter 6.4.2 --- Monolingual Cantonese Recognition --- p.70 / Chapter 6.4.3 --- Code-mixing Speech Recognition --- p.72 / Chapter 6.5 --- Discussion --- p.74 / Chapter 6.5.1 --- Data Collection from the Internet --- p.74 / Chapter 6.5.2 --- Speech Recognition of Code-mixing Speech --- p.75 / Chapter 7 --- Conclusions and Future Work --- p.77 / Chapter 7.1 --- Conclusions --- p.77 / Chapter 7.1.1 --- Rule-based Translation Method --- p.77 / Chapter 7.1.2 --- Cantonese Language Modeling --- p.78 / Chapter 7.1.3 --- Code-mixing Language Modeling --- p.78 / Chapter 7.2 --- Future Works --- p.79 / Chapter 7.2.1 --- Rule-based Translation --- p.79 / Chapter 7.2.2 --- Training data --- p.80 / Chapter 7.2.3 --- Code-mixing speech --- p.80 / Chapter A --- Equation Derivation --- p.82 / Chapter A.l --- Relationship between Average Mutual Information and Perplexity --- p.82 / Bibliography --- p.83
|
189 |
Creation of a pronunciation dictionary for automatic speech recognition : a morphological approachNkosi, Mpho Caselinah January 2012 (has links)
Thesis (M.Sc. (Computer Science)) --University of Limpopo, 2012 / Pronunciation dictionaries or lexicons play an important role in guiding the predictive powers of an Automatic Speech Recognition (ASR) system. As the use of automatic speech recognition systems increases, there is a need for the development of dictionaries that cover a large number of inflected word forms to enhance the performance of ASR systems. The main purpose of this study is to investigate the contribution of the morphological approach to creating a more comprehensive and broadly representative Northern Sotho pronunciation dictionary for Automatic Speech Recognition systems.
The Northern Sotho verbs together with morphological rules are used to generate more valid inflected word forms in the Northern Sotho language for the creation of a pronunciation dictionary. The pronunciation dictionary is developed using the Dictionary Maker tool. The Hidden Markov Model Toolkit is used to develop a simple ASR system in order to evaluate the performance of the ASR system when using the created pronunciation dictionary.
|
190 |
Prosodic features for a maximum entropy language modelChan, Oscar January 2008 (has links)
A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognition, its use in ASR has been limited. The goal of this research is to investigate how prosodic information can be employed to improve the language modelling component of a continuous speech recognition system. Because prosodic features are largely suprasegmental, operating over units larger than the phonetic segment, the language model is an appropriate place to incorporate such information. The prosodic features and standard language model features are combined under the maximum entropy framework, which provides an elegant solution to modelling information obtained from multiple, differing knowledge sources. We derive features for the model based on perceptually transcribed Tones and Break Indices (ToBI) labels, and analyse their contribution to the word recognition task. While ToBI has a solid foundation in linguistic theory, the need for human transcribers conflicts with the statistical model's requirement for a large quantity of training data. We therefore also examine the applicability of features which can be automatically extracted from the speech signal. We develop representations of an utterance's prosodic context using fundamental frequency, energy and duration features, which can be directly incorporated into the model without the need for manual labelling. Dimensionality reduction techniques are also explored with the aim of reducing the computational costs associated with training a maximum entropy model. Experiments on a prosodically transcribed corpus show that small but statistically significant reductions to perplexity and word error rates can be obtained by using both manually transcribed and automatically extracted features.
|
Page generated in 0.1248 seconds