Global ETD Search

171	An integration of hidden Markov model and neural network for phoneme recognition. January 1993 (has links) by Patrick Shu Pui Ko. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1993. / Includes bibliographical references (leaves 77-78). / Chapter 1. --- Introduction --- p.1 / Chapter 1.1 --- Introduction to Speech Recognition --- p.1 / Chapter 1.2 --- Classifications and Constraints of Speech Recognition Systems --- p.1 / Chapter 1.2.1 --- Isolated Subword Unit Recognition --- p.1 / Chapter 1.2.2 --- Isolated Word Recognition --- p.2 / Chapter 1.2.3 --- Continuous Speech Recognition --- p.2 / Chapter 1.3 --- Objective of the Thesis --- p.3 / Chapter 1.3.1 --- What is the Problem --- p.3 / Chapter 1.3.2 --- How the Problem is Approached --- p.3 / Chapter 1.3.3 --- The Organization of this Thesis --- p.3 / Chapter 2. --- Literature Review --- p.5 / Chapter 2.1 --- Approaches to the Problem of Speech Recognition --- p.5 / Chapter 2.1.1 --- Template-Based Approaches --- p.6 / Chapter 2.1.2 --- Knowledge-Based Approaches --- p.9 / Chapter 2.1.3 --- Stochastic Approaches --- p.10 / Chapter 2.1.4 --- Connectionist Approaches --- p.14 / Chapter 3. --- Discrimination Issues of HMM --- p.16 / Chapter 3.1 --- Maximum Likelihood Estimation (MLE) --- p.16 / Chapter 3.2 --- Maximum Mutual Information (MMI) --- p.17 / Chapter 4. --- Neural Networks --- p.19 / Chapter 4.1 --- History --- p.19 / Chapter 4.2 --- Basic Concepts --- p.20 / Chapter 4.3 --- Learning --- p.21 / Chapter 4.3.1 --- Supervised Training --- p.21 / Chapter 4.3.2 --- Reinforcement Training --- p.22 / Chapter 4.3.3 --- Self-Organization --- p.22 / Chapter 4.4 --- Error Back-propagation --- p.22 / Chapter 5. --- Proposal of a Discriminative Neural Network Layer --- p.25 / Chapter 5.1 --- Rationale --- p.25 / Chapter 5.2 --- HMM Parameters --- p.27 / Chapter 5.3 --- Neural Network Layer --- p.28 / Chapter 5.4 --- Decision Rules --- p.29 / Chapter 6. --- Data Preparation --- p.31 / Chapter 6.1 --- TIMIT --- p.31 / Chapter 6.2 --- Feature Extraction --- p.34 / Chapter 6.3 --- Training --- p.43 / Chapter 7. --- Experiments and Results --- p.52 / Chapter 7.1 --- Experiments --- p.52 / Chapter 7.2 --- Experiment I --- p.52 / Chapter 7.3 --- Experiment II --- p.55 / Chapter 7.4 --- Experiment III --- p.57 / Chapter 7.5 --- Experiment IV --- p.58 / Chapter 7.6 --- Experiment V --- p.60 / Chapter 7.7 --- Computational Issues --- p.62 / Chapter 7.8 --- Limitations --- p.63 / Chapter 8. --- Conclusion --- p.64 / Chapter 9. --- Future Directions --- p.67 / Appendix / Chapter A. --- Linear Predictive Coding --- p.69 / Chapter B. --- Implementation of a Vector Quantizer --- p.70 / Chapter C. --- Implementation of HMM --- p.73 / Chapter C.1 --- Calculations Underflow --- p.73 / Chapter C.2 --- Zero-lising Effect --- p.75 / Chapter C.3 --- Training With Multiple Observation Sequences --- p.76 / References --- p.77 Automatic speech recognition Markov processes Neural networks (Computer science)
172	Continuous speech phoneme recognition using neural networks and grammar correction. January 1995 (has links) by Wai-Tat Fu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 104-[109]). / Chapter 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Problem of Speech Recognition --- p.1 / Chapter 1.2 --- Why continuous speech recognition? --- p.5 / Chapter 1.3 --- Current status of continuous speech recognition --- p.6 / Chapter 1.4 --- Research Goal --- p.10 / Chapter 1.5 --- Thesis outline --- p.10 / Chapter 2 --- Current Approaches to Continuous Speech Recognition --- p.12 / Chapter 2.1 --- BASIC STEPS FOR CONTINUOUS SPEECH RECOGNITION --- p.12 / Chapter 2.2 --- THE HIDDEN MARKOV MODEL APPROACH --- p.16 / Chapter 2.2.1 --- Introduction --- p.16 / Chapter 2.2.2 --- Segmentation and Pattern Matching --- p.18 / Chapter 2.2.3 --- Word Formation and Syntactic Processing --- p.22 / Chapter 2.2.4 --- Discussion --- p.23 / Chapter 2.3 --- NEURAL NETWORK APPROACH --- p.24 / Chapter 2.3.1 --- Introduction --- p.24 / Chapter 2.3.2 --- Segmentation and Pattern Matching --- p.25 / Chapter 2.3.3 --- Discussion --- p.27 / Chapter 2.4 --- MLP/HMM HYBRID APPROACH --- p.28 / Chapter 2.4.1 --- Introduction --- p.28 / Chapter 2.4.2 --- Architecture of Hybrid MLP/HMM Systems --- p.29 / Chapter 2.4.3 --- Discussions --- p.30 / Chapter 2.5 --- SYNTACTIC GRAMMAR --- p.30 / Chapter 2.5.1 --- Introduction --- p.30 / Chapter 2.5.2 --- Word formation and Syntactic Processing --- p.31 / Chapter 2.5.3 --- Discussion --- p.32 / Chapter 2.6 --- SUMMARY --- p.32 / Chapter 3 --- Neural Network As Pattern Classifier --- p.34 / Chapter 3.1 --- INTRODUCTION --- p.34 / Chapter 3.2 --- TRAINING ALGORITHMS AND TOPOLOGIES --- p.35 / Chapter 3.2.1 --- Multilayer Perceptrons --- p.35 / Chapter 3.2.2 --- Recurrent Neural Networks --- p.39 / Chapter 3.2.3 --- Self-organizing Maps --- p.41 / Chapter 3.2.4 --- Learning Vector Quantization --- p.43 / Chapter 3.3 --- EXPERIMENTS --- p.44 / Chapter 3.3.1 --- The Data Set --- p.44 / Chapter 3.3.2 --- Preprocessing of the Speech Data --- p.45 / Chapter 3.3.3 --- The Pattern Classifiers --- p.50 / Chapter 3.4 --- RESULTS AND DISCUSSIONS --- p.53 / Chapter 4 --- High Level Context Information --- p.56 / Chapter 4.1 --- INTRODUCTION --- p.56 / Chapter 4.2 --- HIDDEN MARKOV MODEL APPROACH --- p.57 / Chapter 4.3 --- THE DYNAMIC PROGRAMMING APPROACH --- p.59 / Chapter 4.4 --- THE SYNTACTIC GRAMMAR APPROACH --- p.60 / Chapter 5 --- Finite State Grammar Network --- p.62 / Chapter 5.1 --- INTRODUCTION --- p.62 / Chapter 5.2 --- THE GRAMMAR COMPILATION --- p.63 / Chapter 5.2.1 --- Introduction --- p.63 / Chapter 5.2.2 --- K-Tails Clustering Method --- p.66 / Chapter 5.2.3 --- Inference of finite state grammar --- p.67 / Chapter 5.2.4 --- Error Correcting Parsing --- p.69 / Chapter 5.3 --- EXPERIMENT --- p.71 / Chapter 5.4 --- RESULTS AND DISCUSSIONS --- p.73 / Chapter 6 --- The Integrated System --- p.81 / Chapter 6.1 --- INTRODUCTION --- p.81 / Chapter 6.2 --- POSTPROCESSING OF NEURAL NETWORK OUTPUT --- p.82 / Chapter 6.2.1 --- Activation Threshold --- p.82 / Chapter 6.2.2 --- Duration Threshold --- p.85 / Chapter 6.2.3 --- Merging of Phoneme boundaries --- p.88 / Chapter 6.3 --- THE ERROR CORRECTING PARSER --- p.90 / Chapter 6.4 --- RESULTS AND DISCUSSIONS --- p.96 / Chapter 7 --- Conclusions --- p.101 / Bibliography --- p.105 Automatic speech recognition Neural networks (Computer science) Phonemics--Data processing
173	Linguistic constraints for large vocabulary speech recognition. January 1999 (has links) by Roger H.Y. Leung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 79-84). / Abstracts in English and Chinese. / ABSTRACT --- p.I / Keywords: --- p.I / ACKNOWLEDGEMENTS --- p.III / TABLE OF CONTENTS: --- p.IV / Table of Figures: --- p.VI / Table of Tables: --- p.VII / Chapter CHAPTER 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Languages in the World --- p.2 / Chapter 1.2 --- Problems of Chinese Speech Recognition --- p.3 / Chapter 1.2.1 --- Unlimited word size: --- p.3 / Chapter 1.2.2 --- Too many Homophones: --- p.3 / Chapter 1.2.3 --- Difference between spoken and written Chinese: --- p.3 / Chapter 1.2.4 --- Word Segmentation Problem: --- p.4 / Chapter 1.3 --- Different types of knowledge --- p.5 / Chapter 1.4 --- Chapter Conclusion --- p.6 / Chapter CHAPTER 2 --- FOUNDATIONS --- p.7 / Chapter 2.1 --- Chinese Phonology and Language Properties --- p.7 / Chapter 2.1.1 --- Basic Syllable Structure --- p.7 / Chapter 2.2 --- Acoustic Models --- p.9 / Chapter 2.2.1 --- Acoustic Unit --- p.9 / Chapter 2.2.2 --- Hidden Markov Model (HMM) --- p.9 / Chapter 2.3 --- Search Algorithm --- p.11 / Chapter 2.4 --- Statistical Language Models --- p.12 / Chapter 2.4.1 --- Context-Independent Language Model --- p.12 / Chapter 2.4.2 --- Word-Pair Language Model --- p.13 / Chapter 2.4.3 --- N-gram Language Model --- p.13 / Chapter 2.4.4 --- Backoff n-gram --- p.14 / Chapter 2.5 --- Smoothing for Language Model --- p.16 / Chapter CHAPTER 3 --- LEXICAL ACCESS --- p.18 / Chapter 3.1 --- Introduction --- p.18 / Chapter 3.2 --- Motivation： Phonological and lexical constraints --- p.20 / Chapter 3.3 --- Broad Classes Representation --- p.22 / Chapter 3.4 --- Broad Classes Statistic Measures --- p.25 / Chapter 3.5 --- Broad Classes Frequency Normalization --- p.26 / Chapter 3.6 --- Broad Classes Analysis --- p.27 / Chapter 3.7 --- Isolated Word Speech Recognizer using Broad Classes --- p.33 / Chapter 3.8 --- Chapter Conclusion --- p.34 / Chapter CHAPTER 4 --- CHARACTER AND WORD LANGUAGE MODEL --- p.35 / Chapter 4.1 --- Introduction --- p.35 / Chapter 4.2 --- Motivation --- p.36 / Chapter 4.2.1 --- Perplexity --- p.36 / Chapter 4.3 --- Call Home Mandarin corpus --- p.38 / Chapter 4.3.1 --- Acoustic Data --- p.38 / Chapter 4.3.2 --- Transcription Texts --- p.39 / Chapter 4.4 --- Methodology: Building Language Model --- p.41 / Chapter 4.5 --- Character Level Language Model --- p.45 / Chapter 4.6 --- Word Level Language Model --- p.48 / Chapter 4.7 --- Comparison of Character level and Word level Language Model --- p.50 / Chapter 4.8 --- Interpolated Language Model --- p.54 / Chapter 4.8.1 --- Methodology --- p.54 / Chapter 4.8.2 --- Experiment Results --- p.55 / Chapter 4.9 --- Chapter Conclusion --- p.56 / Chapter CHAPTER 5 --- N-GRAM SMOOTHING --- p.57 / Chapter 5.1 --- Introduction --- p.57 / Chapter 5.2 --- Motivation --- p.58 / Chapter 5.3 --- Mathematical Representation --- p.59 / Chapter 5.4 --- Methodology: Smoothing techniques --- p.61 / Chapter 5.4.1 --- Add-one Smoothing --- p.62 / Chapter 5.4.2 --- Witten-Bell Discounting --- p.64 / Chapter 5.4.3 --- Good Turing Discounting --- p.66 / Chapter 5.4.4 --- Absolute and Linear Discounting --- p.68 / Chapter 5.5 --- Comparison of Different Discount Methods --- p.70 / Chapter 5.6 --- Continuous Word Speech Recognizer --- p.71 / Chapter 5.6.1 --- Experiment Setup --- p.71 / Chapter 5.6.2 --- Experiment Results: --- p.72 / Chapter 5.7 --- Chapter Conclusion --- p.74 / Chapter CHAPTER 6 --- SUMMARY AND CONCLUSIONS --- p.75 / Chapter 6.1 --- Summary --- p.75 / Chapter 6.2 --- Further Work --- p.77 / Chapter 6.3 --- Conclusion --- p.78 / REFERENCE --- p.79 Automatic speech recognition Chinese language--Data processing Smoothing (Statistics)
174	Unsupervised model adaptation for continuous speech recognition using model-level confidence measures. January 2002 (has links) Kwan Ka Yan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Automatic Speech Recognition --- p.1 / Chapter 1.2. --- Robustness of ASR Systems --- p.3 / Chapter 1.3. --- Model Adaptation for Robust ASR --- p.4 / Chapter 1.4. --- Thesis outline --- p.6 / References --- p.8 / Chapter 2. --- Fundamentals of Continuous Speech Recognition --- p.10 / Chapter 2.1. --- Acoustic Front-End --- p.10 / Chapter 2.2. --- Recognition Module --- p.11 / Chapter 2.2.1. --- Acoustic Modeling with HMM --- p.12 / Chapter 2.2.2. --- Basic Phonology of Cantonese --- p.14 / Chapter 2.2.3. --- Acoustic Modeling for Cantonese --- p.15 / Chapter 2.2.4. --- Language Modeling --- p.16 / References --- p.17 / Chapter 3. --- Unsupervised Model Adaptation --- p.18 / Chapter 3.1. --- A General Review of Model Adaptation --- p.18 / Chapter 3.1.1. --- Supervised and Unsupervised Adaptation --- p.20 / Chapter 3.1.2. --- N-Best Adaptation --- p.22 / Chapter 3.2. --- MAP --- p.23 / Chapter 3.3. --- MLLR --- p.25 / Chapter 3.3.1. --- Adaptation Approach --- p.26 / Chapter 3.3.2. --- Estimation of MLLR regression matrices --- p.27 / Chapter 3.3.3. --- Least Mean Squares Regression --- p.29 / Chapter 3.3.4. --- Number of Transformations --- p.30 / Chapter 3.4. --- Experiment Results --- p.32 / Chapter 3.4.1. --- Standard MLLR versus LMS MLLR --- p.36 / Chapter 3.4.2. --- Effect of the Number of Transformations --- p.43 / Chapter 3.4.3. --- MAP Vs. MLLR --- p.46 / Chapter 3.5. --- Conclusions --- p.48 / Referencesxlix / Chapter 4. --- Use of Confidence Measure for MLLR based Adaptation --- p.50 / Chapter 4.1. --- Introduction to Confidence Measure --- p.50 / Chapter 4.2. --- Confidence Measure Based on Word Density --- p.51 / Chapter 4.3. --- Model-level confidence measure --- p.53 / Chapter 4.4. --- Integrating Confusion Information into Confidence Measure --- p.55 / Chapter 4.5. --- Adaptation Data Distributions in Different Confidence Measures..… --- p.57 / References --- p.65 / Chapter 5. --- Experimental Results and Analysis --- p.66 / Chapter 5.1. --- Supervised Adaptation --- p.67 / Chapter 5.2. --- Cheated Confidence Measure --- p.69 / Chapter 5.3. --- Confidence Measures of Different Levels --- p.71 / Chapter 5.4. --- Incorporation of Confusion Matrix --- p.81 / Chapter 5.5. --- Conclusions --- p.83 / Chapter 6. --- Conclusions --- p.35 / Chapter 6.1. --- Future Works --- p.88 Automatic speech recognition Regression analysis
175	Natural language understanding across application domains and languages. January 2002 (has links) Tsui Wai-Ching. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 115-122). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding Using Belief Networks --- p.5 / Chapter 1.3 --- Integrating Speech Recognition with Natural Language Un- derstanding --- p.7 / Chapter 1.4 --- Thesis Goals --- p.9 / Chapter 1.5 --- Thesis Organization --- p.10 / Chapter 2 --- Background --- p.12 / Chapter 2.1 --- Natural Language Understanding Approaches --- p.13 / Chapter 2.1.1 --- Rule-based Approaches --- p.15 / Chapter 2.1.2 --- Stochastic Approaches --- p.16 / Chapter 2.1.3 --- Mixed Approaches --- p.18 / Chapter 2.2 --- Portability of Natural Language Understanding Frameworks --- p.19 / Chapter 2.2.1 --- Portability across Domains --- p.19 / Chapter 2.2.2 --- Portability across Languages --- p.20 / Chapter 2.2.3 --- Portability across both Domains and Languages --- p.21 / Chapter 2.3 --- Spoken Language Understanding --- p.21 / Chapter 2.3.1 --- Integration of Speech Recognition Confidence into Nat- ural Language Understanding --- p.22 / Chapter 2.3.2 --- Integration of Other Potential Confidence Features into Natural Language Understanding --- p.24 / Chapter 2.4 --- Belief Networks --- p.24 / Chapter 2.4.1 --- Overview --- p.24 / Chapter 2.4.2 --- Bayesian Inference --- p.26 / Chapter 2.5 --- Transformation-based Parsing Technique --- p.27 / Chapter 2.6 --- Chapter Summary --- p.28 / Chapter 3 --- Portability of the Natural Language Understanding Frame- work across Application Domains and Languages --- p.31 / Chapter 3.1 --- Natural Language Understanding Framework --- p.32 / Chapter 3.1.1 --- Semantic Tagging --- p.33 / Chapter 3.1.2 --- Informational Goal Inference with Belief Networks --- p.34 / Chapter 3.2 --- The ISIS Stocks Domain --- p.36 / Chapter 3.3 --- A Unified Framework for English and Chinese --- p.38 / Chapter 3.3.1 --- Semantic Tagging for the ISIS domain --- p.39 / Chapter 3.3.2 --- Transformation-based Parsing --- p.40 / Chapter 3.3.3 --- Informational Goal Inference with Belief Networks for the ISIS domain --- p.43 / Chapter 3.4 --- Experiments --- p.45 / Chapter 3.4.1 --- Goal Identification Experiments --- p.45 / Chapter 3.4.2 --- A Cross-language Experiment --- p.49 / Chapter 3.5 --- Chapter Summary --- p.55 / Chapter 4 --- Enhancement in the Belief Networks for Informational Goal Inference --- p.57 / Chapter 4.1 --- Semantic Concept Selection in Belief Networks --- p.58 / Chapter 4.1.1 --- Selection of Positive Evidence --- p.58 / Chapter 4.1.2 --- Selection of Negative Evidence --- p.62 / Chapter 4.2 --- Estimation of Statistical Probabilities in the Enhanced Belief Networks --- p.64 / Chapter 4.2.1 --- Estimation of Prior Probabilities --- p.65 / Chapter 4.2.2 --- Estimation of Posterior Probabilities --- p.66 / Chapter 4.3 --- Experiments --- p.73 / Chapter 4.3.1 --- Belief Networks Developed with Positive Evidence --- p.74 / Chapter 4.3.2 --- Belief Networks with the Injection of Negative Evidence --- p.76 / Chapter 4.4 --- Chapter Summary --- p.82 / Chapter 5 --- Integration between Speech Recognition and Natural Lan- guage Understanding --- p.84 / Chapter 5.1 --- The Speech Corpus for the Chinese ISIS Stocks Domain --- p.86 / Chapter 5.2 --- Our Extended Natural Language Understanding Framework for Spoken Language Understanding --- p.90 / Chapter 5.2.1 --- Integrated Scoring for Chinese Speech Recognition and Natural Language Understanding --- p.92 / Chapter 5.3 --- Experiments --- p.92 / Chapter 5.3.1 --- Training and Testing on the Perfect Reference Data Sets --- p.93 / Chapter 5.3.2 --- Mismatched Training and Testing Conditions ´ؤ Perfect Reference versus Imperfect Hypotheses --- p.93 / Chapter 5.3.3 --- Comparing Goal Identification between the Use of Single- best versus N-best Recognition Hypotheses --- p.95 / Chapter 5.3.4 --- Integration of Speech Recognition Confidence Scores into Natural Language Understanding --- p.97 / Chapter 5.3.5 --- Feasibility of Our Approach for Spoken Language Un- derstanding --- p.99 / Chapter 5.3.6 --- Justification of Using Max-of-max Classifier in Our Single Goal Identification Scheme --- p.107 / Chapter 5.4 --- Chapter Summary --- p.109 / Chapter 6 --- Conclusions and Future Work --- p.110 / Chapter 6.1 --- Conclusions --- p.110 / Chapter 6.2 --- Contributions --- p.112 / Chapter 6.3 --- Future Work --- p.113 / Bibliography --- p.115 / Chapter A --- Semantic Frames for Chinese --- p.123 / Chapter B --- Semantic Frames for English --- p.127 / Chapter C --- The Concept Set of Positive Evidence for the Nine Goalsin English --- p.131 / Chapter D --- The Concept Set of Positive Evidence for the Ten Goalsin Chinese --- p.133 / Chapter E --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in English --- p.135 / Chapter F --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in Chinese --- p.138 / Chapter G --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in Chinese --- p.141 / Chapter H --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in English --- p.146 Machine learning Automatic speech recognition
176	HMM based connected speech recognition system for Cantonese =: 建基於隱馬爾可夫模型的粤語連續語音識別系統. / 建基於隱馬爾可夫模型的粤語連續語音識別系統 / An HMM based connected speech recognition system for Cantonese =: Jian ji yu Yin Ma'erkefu mo xing de Yue yu lian xu yu yin shi bie xi tong. / Jian ji yu Yin Ma'erkefu mo xing de Yue yu lian xu yu yin shi bie xi tong January 1998 (has links) by Chow Ka Fai. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves [124-132]). / Text in English; abstract also in Chinese. / by Chow Ka Fai. / Chapter 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Speech Recognition Technology --- p.4 / Chapter 1.2 --- Automatic Recognition of Cantonese Speech --- p.6 / Chapter 1.3 --- Objectives of the thesis --- p.8 / Chapter 1.4 --- Thesis Outline --- p.11 / Chapter 2 --- FUNDAMENTALS OF HMM BASED RECOGNITION SYSTEM --- p.13 / Chapter 2.1 --- Introduction --- p.13 / Chapter 2.2 --- HMM Fundamentals --- p.13 / Chapter 2.2.1 --- HMM Structure and Behavior --- p.13 / Chapter 2.2.2 --- HMM-based Speech Modeling --- p.15 / Chapter 2.2.3 --- Mathematics --- p.18 / Chapter 2.3 --- hmm Based Speech Recognition System --- p.22 / Chapter 2.3.1 --- Isolated Speech Recognition --- p.23 / Chapter 2.3.2 --- Connected Speech Recognition --- p.25 / Chapter 2.4 --- Algorithms for Finding Hidden State Sequence --- p.28 / Chapter 2.4.1 --- Forward-backward algorithm --- p.29 / Chapter 2.4.2 --- Viterbi Decoder Algorithm --- p.31 / Chapter 2.5 --- Parameter Estimation --- p.32 / Chapter 2.5.1 --- Basic Ideas for Estimation --- p.32 / Chapter 2.5.2 --- Single Model Re-estimation Using Best State-Time Alignment (HINIT) --- p.36 / Chapter 2.5.3 --- Single Model Re-estimation Using Baum- Welch Method (HREST) --- p.39 / Chapter 2.5.4 --- HMM Embedded Re-estimation (HEREST) --- p.41 / Chapter 2.6 --- Feature Extraction --- p.42 / Chapter 2.7 --- Summary --- p.47 / Chapter 3 --- CANTONESE PHONOLOGY AND LANGUAGE PROPERTIES --- p.48 / Chapter 3.1 --- Introduction --- p.48 / Chapter 3.2 --- Cantonese and Chinese Language --- p.48 / Chapter 3.2.1 --- Chinese Words and Characters --- p.48 / Chapter 3.2.2 --- The Relationship between Cantonese and Chinese Characters --- p.50 / Chapter 3.3 --- Basic Syllable structure --- p.51 / Chapter 3.3.1 --- CVC structure --- p.51 / Chapter 3.3.2 --- Cantonese Phonemes --- p.52 / Chapter 3.3.3 --- The Initial-Final structure --- p.55 / Chapter 3.3.4 --- Cantonese Nine Tone System --- p.57 / Chapter 3.4 --- Acoustic Properties of Cantonese --- p.58 / Chapter 3.5 --- Cantonese Phonology for Speech Recognition --- p.60 / Chapter 3.6 --- Summary --- p.62 / Chapter 4 --- CANTONESE SPEECH DATABASES --- p.64 / Chapter 4.1 --- Introduction --- p.64 / Chapter 4.2 --- The Importance of Speech Data --- p.64 / Chapter 4.3 --- The Demands of Cantonese Speech Databases --- p.67 / Chapter 4.4 --- Principles in Cantonese Database Development --- p.67 / Chapter 4.5 --- Resources and Limitations for Database Designs --- p.69 / Chapter 4.6 --- Details of Speech Databases --- p.69 / Chapter 4.6.1 --- Multiple speakers' Speech Database (CUWORD) --- p.70 / Chapter 4.6.2 --- Single Speaker's Speech Database (MYVOICE) --- p.72 / Chapter 4.7 --- Difficulties and Solutions in Recording Process --- p.76 / Chapter 4.8 --- Verification of Phonetic Transcription --- p.78 / Chapter 4.9 --- Summary --- p.79 / Chapter 5 --- TRAINING OF AN HMM BASED CANTONESE SPEECH RECOGNITION SYSTEM --- p.80 / Chapter 5.1 --- Introduction --- p.80 / Chapter 5.2 --- Objectives of HMM Development --- p.81 / Chapter 5.3 --- The Design of Initial-Final Models --- p.83 / Chapter 5.4 --- Initialization of Basic Initial-Final Models --- p.84 / Chapter 5.4.1 --- The Initialization Training with HEREST --- p.85 / Chapter 5.4.2 --- Refinement of Initialized Models --- p.88 / Chapter 5.4.3 --- Evaluation of the Models --- p.90 / Chapter 5.5 --- Training of Connected Speech Speaker Dependent Models --- p.93 / Chapter 5.5.1 --- Training Strategy --- p.93 / Chapter 5.5.2 --- Preliminary Result --- p.94 / Chapter 5.6 --- Design and Training of Context Dependent Initial Final Models --- p.95 / Chapter 5.6.1 --- Intra-syllable Context Dependent Units --- p.96 / Chapter 5.6.2 --- The Inter-syllable Context Dependent Units --- p.97 / Chapter 5.6.3 --- Model Refinement by Using Mixture Incrementing --- p.98 / Chapter 5.7 --- Training of Speaker Independent Models --- p.99 / Chapter 5.8 --- Discussions --- p.100 / Chapter 5.9 --- Summary --- p.101 / Chapter 6 --- PERFORMANCE ANALYSIS --- p.102 / Chapter 6.1 --- Substitution Errors --- p.102 / Chapter 6.1.1 --- Confusion of Long Vowels and Short Vowels for Initial Stop Consonants --- p.102 / Chapter 6.1.2 --- Confusion of Nasal Endings --- p.103 / Chapter 6.1.3 --- Confusion of Final Stop Consonants --- p.104 / Chapter 6.2 --- Insertion Errors and Deletion Errors --- p.105 / Chapter 6.3 --- Accuracy of Individual Models --- p.106 / Chapter 6.4 --- The Impact of Individual Models --- p.107 / Chapter 6.4.1 --- The Expected Error Rate of Initial Models --- p.110 / Chapter 6.4.2 --- The Expected Error Rate of Final Models --- p.111 / Chapter 6.5 --- Suggested Solutions for Error Reduction --- p.113 / Chapter 6.5.1 --- Duration Constraints --- p.113 / Chapter 6.5.2 --- The Use of Language Model --- p.113 / Chapter 6.6 --- Summary --- p.114 / Chapter 7 --- APPLICATIONS EXAMPLES OF THE HMM RECOGNITION SYSTEM --- p.115 / Chapter 7.1 --- Introduction --- p.115 / Chapter 7.2 --- Application 1: A Hong Kong Stock Market Inquiry System --- p.116 / Chapter 7.3 --- Application 2： A Navigating System for Hong Kong Street Map --- p.117 / Chapter 7.4 --- Automatic Character-to-Phonetic Conversion --- p.118 / Chapter 7.5 --- Summary --- p.119 / Chapter 8 --- CONCLUSIONS AND SUGGESTIONS FOR FURTHER WORK --- p.120 / Chapter 8.1 --- Conclusions --- p.120 / Chapter 8.2 --- Suggestions for Future Work --- p.122 / Chapter 8.2.1 --- Development of Continuous Speech Recognition System --- p.122 / Chapter 8.2.2 --- Implementation of Statistical Language Models --- p.122 / Chapter 8.2.3 --- Tones for Continuous Speech --- p.123 / BIBILOGRAPHY / APPENDIX Automatic speech recognition Markov processes Cantonese dialects--Data processing
177	Kernel Approximation Methods for Speech Recognition May, Avner January 2018 (has links) Over the past five years or so, deep learning methods have dramatically improved the state of the art performance in a variety of domains, including speech recognition, computer vision, and natural language processing. Importantly, however, they suffer from a number of drawbacks: 1. Training these models is a non-convex optimization problem, and thus it is difficult to guarantee that a trained model minimizes the desired loss function. 2. These models are difficult to interpret. In particular, it is difficult to explain, for a given model, why the computations it performs make accurate predictions. In contrast, kernel methods are straightforward to interpret, and training them is a convex optimization problem. Unfortunately, solving these optimization problems exactly is typically prohibitively expensive, though one can use approximation methods to circumvent this problem. In this thesis, we explore to what extent kernel approximation methods can compete with deep learning, in the context of large-scale prediction tasks. Our contributions are as follows: 1. We perform the most extensive set of experiments to date using kernel approximation methods in the context of large-scale speech recognition tasks, and compare performance with deep neural networks. 2. We propose a feature selection algorithm which significantly improves the performance of the kernel models, making their performance competitive with fully-connected feedforward neural networks. 3. We perform an in-depth comparison between two leading kernel approximation strategies — random Fourier features [Rahimi and Recht, 2007] and the Nyström method [Williams and Seeger, 2001] — showing that although the Nyström method is better at approximating the kernel, it performs worse than random Fourier features when used for learning. We believe this work opens the door for future research to continue to push the boundary of what is possible with kernel methods. This research direction will also shed light on the question of when, if ever, deep models are needed for attaining strong performance. Computer science Artificial intelligence Automatic speech recognition Kernel functions
178	Application-specific instruction set processor for speech recognition. January 2005 (has links) Cheung Man Ting. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 69-71). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Emergence of ASIP --- p.1 / Chapter 1.1.1 --- Related Work --- p.3 / Chapter 1.2 --- Motivation --- p.6 / Chapter 1.3 --- ASIP Design Methodologies --- p.7 / Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8 / Chapter 1.5 --- Thesis outline --- p.10 / Chapter 2 --- Automatic Speech Recognition --- p.11 / Chapter 2.1 --- Overview of ASR system --- p.11 / Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12 / Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14 / Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14 / Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14 / Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15 / Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17 / Chapter 2.3.5 --- Probability Evaluation --- p.18 / Chapter 2.4 --- The Viterbi Search Engine --- p.19 / Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22 / Chapter 3 --- Design of ASIP Platform --- p.24 / Chapter 3.1 --- Instruction Fetch --- p.25 / Chapter 3.2 --- Instruction Decode --- p.26 / Chapter 3.3 --- Datapath --- p.29 / Chapter 3.4 --- Register File Systems --- p.30 / Chapter 3.4.1 --- Memory Hierarchy --- p.30 / Chapter 3.4.2 --- Register File Organization --- p.31 / Chapter 3.4.3 --- Special Registers --- p.34 / Chapter 3.4.4 --- Address Generation --- p.34 / Chapter 3.4.5 --- Load and Store --- p.36 / Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37 / Chapter 4.1 --- Hardware Architecture Exploration --- p.37 / Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37 / Chapter 4.1.2 --- Multiplication and Accumulation --- p.38 / Chapter 4.1.3 --- Pipelining --- p.41 / Chapter 4.1.4 --- Memory Architecture --- p.43 / Chapter 4.1.5 --- Saturation Logic --- p.44 / Chapter 4.1.6 --- Specialized Addressing Modes --- p.44 / Chapter 4.1.7 --- Repetitive Operation --- p.47 / Chapter 4.2 --- Software Algorithm Implementation --- p.49 / Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49 / Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54 / Chapter 5 --- Simulation Results --- p.56 / Chapter 6 --- Conclusions and Future Work --- p.60 / Appendices --- p.62 / Chapter A --- Base Instruction Set --- p.62 / Chapter B --- Special Registers --- p.65 / Chapter C --- Chip Microphotograph of ASIP --- p.67 / Chapter D --- The Testing Board of ASIP --- p.68 / Bibliography --- p.69 Automatic speech recognition Computer architecture Signal processing--Digital techniques
179	An evaluation paradigm for spoken dialog systems based on crowdsourcing and collaborative filtering. January 2011 (has links) Yang, Zhaojun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (p. 92-99). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- SDS Architecture --- p.1 / Chapter 1.2 --- Dialog Model --- p.3 / Chapter 1.3 --- SDS Evaluation --- p.4 / Chapter 1.4 --- Thesis Outline --- p.7 / Chapter 2 --- Previous Work --- p.9 / Chapter 2.1 --- Approaches to Dialog Modeling --- p.9 / Chapter 2.1.1 --- Handcrafted Dialog Modeling --- p.9 / Chapter 2.1.2 --- Statistical Dialog Modeling --- p.12 / Chapter 2.2 --- Evaluation Metrics --- p.16 / Chapter 2.2.1 --- Subjective User Judgments --- p.17 / Chapter 2.2.2 --- Interaction Metrics --- p.18 / Chapter 2.3 --- The PARADISE Framework --- p.19 / Chapter 2.4 --- Chapter Summary --- p.22 / Chapter 3 --- Implementation of a Dialog System based on POMDP --- p.23 / Chapter 3.1 --- Partially Observable Markov Decision Processes (POMDPs) --- p.24 / Chapter 3.1.1 --- Formal Definition --- p.24 / Chapter 3.1.2 --- Value Iteration --- p.26 / Chapter 3.1.3 --- Point-based Value Iteration --- p.27 / Chapter 3.1.4 --- A Toy Example of POMDP: The NaiveBusInfo System --- p.27 / Chapter 3.2 --- The SDS-POMDP Model --- p.31 / Chapter 3.3 --- Composite Summary Point-based Value Iteration (CSPBVI) --- p.33 / Chapter 3.4 --- Application of SDS-POMDP Model: The Buslnfo System --- p.35 / Chapter 3.4.1 --- System Description --- p.35 / Chapter 3.4.2 --- Demonstration Description --- p.39 / Chapter 3.5 --- Chapter Summary --- p.42 / Chapter 4 --- Collecting User Judgments on Spoken Dialogs with Crowdsourcing --- p.46 / Chapter 4.1 --- Dialog Corpus and Automatic Dialog Classification --- p.47 / Chapter 4.2 --- User Judgments Collection with Crowdsourcing --- p.50 / Chapter 4.2.1 --- HITs on Dialog Evaluation --- p.51 / Chapter 4.2.2 --- HITs on Inter-rater Agreement --- p.53 / Chapter 4.2.3 --- Approval of Ratings --- p.54 / Chapter 4.3 --- Collected Results and Analysis --- p.55 / Chapter 4.3.1 --- Approval Rates and Comments from Mturk Workers --- p.55 / Chapter 4.3.2 --- Consistency between Automatic Dialog Classification and Manual Ratings --- p.57 / Chapter 4.3.3 --- Inter-rater Agreement Among Workers --- p.60 / Chapter 4.4 --- Comparing Experts to Non-experts --- p.64 / Chapter 4.4.1 --- Inter-rater Agreement on the Let's Go! System --- p.65 / Chapter 4.4.2 --- Consistency Between Expert and Non-expert Annotations on SDC Systems --- p.66 / Chapter 4.5 --- Chapter Summary --- p.68 / Chapter 5 --- Collaborative Filtering for Performance Prediction --- p.70 / Chapter 5.1 --- Item-Based Collaborative Filtering --- p.71 / Chapter 5.2 --- CF Model for User Satisfaction Prediction --- p.72 / Chapter 5.2.1 --- ICFM for User Satisfaction Prediction --- p.72 / Chapter 5.2.2 --- Extended ICFM for User Satisfaction Prediction --- p.73 / Chapter 5.3 --- Extraction of Interaction Features --- p.74 / Chapter 5.4 --- Experimental Results and Analysis --- p.76 / Chapter 5.4.1 --- Prediction of User Satisfaction --- p.76 / Chapter 5.4.2 --- Analysis of Prediction Results --- p.79 / Chapter 5.5 --- Verifying the Generalibility of CF Model --- p.81 / Chapter 5.6 --- Evaluation of The Buslnfo System --- p.86 / Chapter 5.7 --- Chapter Summary --- p.87 / Chapter 6 --- Conclusions and Future Work --- p.89 / Chapter 6.1 --- Thesis Summary --- p.89 / Chapter 6.2 --- Future Work --- p.90 / Bibliography --- p.92 Automatic speech recognition Human computation
180	Use of tone information in Cantonese LVCSR based on generalized character posterior probability decoding. / CUHK electronic theses & dissertations collection January 2005 (has links) Automatic recognition of Cantonese tones has long been regarded as a difficult task. Cantonese has one of the most complicated tone systems among all languages in the world. This thesis presents a novel approach of modeling Cantonese tones. We propose the use of supra-tone models. Each supra-tone unit covers a number of syllables in succession. The supra-tone model characterizes not only the tone contours of individual syllables but also the transitions among them. By including multiple tone contours in one modeling unit, the relative heights of the tones are captured explicitly. This is especially important for the discrimination among the level tones of Cantonese. / The decoding in conventional LVCSR systems aims at finding the sentence hypothesis, i.e. the string of words, which has the maximum a posterior (MAP) probability in comparison with other hypotheses. However, in most applications, the recognition performance is measured in terms of word error rate (or word accuracy). In Chinese languages, given that "word" is a rather ambiguous concept, speech recognition performance is usually measured in terms of the character error rate. In this thesis, we develop a decoding algorithm that can minimize the character error rate. The algorithm is applied to a reduced search space, e.g. a word graph or the N-best sentence list, which results from the 1st pass of search, and the generalized character posterior probability (GCPP) is maximized. (Abstract shortened by UMI.) / This thesis addresses two major problems of the existing large vocabulary continuous speech recognition (LVCSR) technology: (1) inadequate exploitation of alternative linguistic and acoustic information; and (2) the mismatch between the decoding (recognition) criterion and the performance evaluation. The study is focused on Cantonese, one of the major Chinese dialects, which is also monosyllabic and tonal. Tone is somewhat indispensable for lexical access and disambiguation of homonyms in Cantonese. However, tone information into Cantonese LVCSR requires effective tone recognition as well as a seamless integration algorithm. / Qian Yao. / "July 2005." / Adviser: Tan Lee. / Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 4009. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (p. 100-110). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Automatic speech recognition Cantonese dialects--Data processing Cantonese dialects--Tone

Search results