281 |
The use of subword-based audio indexing in Chinese spoken document retrieval.January 2001 (has links)
Li Yuk Chi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves [112]-119). / Abstracts in English and Chinese. / Abstract --- p.2 / List of Figures --- p.8 / List of Tables --- p.12 / Chapter 1 --- Introduction --- p.17 / Chapter 1.1 --- Information Retrieval --- p.18 / Chapter 1.1.1 --- Information Retrieval Models --- p.19 / Chapter 1.1.2 --- Information Retrieval in English --- p.20 / Chapter 1.1.3 --- Information Retrieval in Chinese --- p.22 / Chapter 1.2 --- Spoken Document Retrieval --- p.24 / Chapter 1.2.1 --- Spoken Document Retrieval in English --- p.25 / Chapter 1.2.2 --- Spoken Document Retrieval in Chinese --- p.25 / Chapter 1.3 --- Previous Work --- p.28 / Chapter 1.4 --- Motivation --- p.32 / Chapter 1.5 --- Goals --- p.33 / Chapter 1.6 --- Thesis Organization --- p.34 / Chapter 2 --- Investigation Framework --- p.35 / Chapter 2.1 --- Indexing the Spoken Document Collection --- p.36 / Chapter 2.2 --- Query Processing --- p.37 / Chapter 2.3 --- Subword Indexing --- p.37 / Chapter 2.4 --- Robustness in Chinese Spoken Document Retrieval --- p.40 / Chapter 2.5 --- Retrieval --- p.40 / Chapter 2.6 --- Evaluation --- p.43 / Chapter 2.6.1 --- Average Inverse Rank --- p.43 / Chapter 2.6.2 --- Mean Average Precision --- p.44 / Chapter 3 --- Subword-based Chinese Spoken Document Retrieval --- p.46 / Chapter 3.1 --- The Cantonese Corpus --- p.48 / Chapter 3.2 --- Known-Item Retrieval --- p.49 / Chapter 3.3 --- Subword Formulation for Cantonese Spoken Document Retrieval --- p.50 / Chapter 3.4 --- Audio Indexing by Cantonese Speech Recognition --- p.52 / Chapter 3.4.1 --- Seed Models from Adapted Data --- p.52 / Chapter 3.4.2 --- Retraining Acoustic Models --- p.53 / Chapter 3.5 --- The Retrieval Model --- p.55 / Chapter 3.6 --- Experiments --- p.56 / Chapter 3.6.1 --- Setup and Observations --- p.57 / Chapter 3.6.2 --- Results Analysis --- p.58 / Chapter 3.7 --- Chapter Summary --- p.63 / Chapter 4 --- Robust Indexing and Retrieval Methods --- p.64 / Chapter 4.1 --- Query Expansion using Phonetic Confusion --- p.65 / Chapter 4.1.1 --- Syllable-Syllable Confusions from Recognition --- p.66 / Chapter 4.1.2 --- Experimental Setup and Observation --- p.67 / Chapter 4.2 --- Document Expansion --- p.71 / Chapter 4.2.1 --- The Side Collection for Expansion --- p.72 / Chapter 4.2.2 --- Detailed Procedures in Document Expansion --- p.72 / Chapter 4.2.3 --- Improvements due to Document Expansion --- p.73 / Chapter 4.3 --- Using both Query and Document Expansion --- p.75 / Chapter 4.4 --- Chapter Summary --- p.76 / Chapter 5 --- Cross-Language Spoken Document Retrieval --- p.78 / Chapter 5.1 --- The Topic Detection and Tracking Collection --- p.80 / Chapter 5.1.1 --- The Spoken Document Collection --- p.81 / Chapter 5.1.2 --- The Translingual Query --- p.82 / Chapter 5.1.3 --- The Side Collection --- p.82 / Chapter 5.1.4 --- Subword-based Indexing --- p.83 / Chapter 5.2 --- The Translingual Retrieval Task --- p.83 / Chapter 5.3 --- Machine Translated Query --- p.85 / Chapter 5.3.1 --- The Unbalanced Query --- p.85 / Chapter 5.3.2 --- The Balanced Query --- p.87 / Chapter 5.3.3 --- Results on the Weight Balancing Scheme --- p.88 / Chapter 5.4 --- Document Expansion from a Side Collection --- p.89 / Chapter 5.5 --- Performance Evaluation and Analysis --- p.91 / Chapter 5.6 --- Chapter Summary --- p.93 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Future Directions --- p.97 / Chapter A --- Input format for the IR engine --- p.101 / Chapter B --- Preliminary Results on the Two Normalization Schemes --- p.102 / Chapter C --- Significance Tests --- p.103 / Chapter C.1 --- Query Expansions for Cantonese Spoken Document Retrieval --- p.103 / Chapter C.2 --- Document Expansion for Cantonese Spoken Document Retrieval --- p.105 / Chapter C.3 --- Balanced Query for Cross-Language Spoken Document Retrieval --- p.107 / Chapter C.4 --- Document Expansion for Cross-Language Spoken Document Retrieval --- p.107 / Chapter D --- The Use of an Unrelated Source for Expanding Spoken Doc- uments in Cantonese --- p.110 / Bibliography --- p.110
|
282 |
A Novel Non-Acoustic Voiced Speech Sensor: Experimental Results and CharacterizationKeenaghan, Kevin Michael 14 January 2004 (has links)
Recovering clean speech from an audio signal with additive noise is a problem that has plagued the signal processing community for decades. One promising technique currently being utilized in speech-coding applications is a multi-sensor approach, in which a microphone is used in conjunction with optical, mechanical, and electrical non-acoustic speech sensors to provide greater versatility in signal processing algorithms. One such non-acoustic glottal waveform sensor is the Tuned Electromagnetic Resonator Collar (TERC) sensor, first developed in [BLP+02]. The sensor is based on Magnetic Resonance Imaging (MRI) concepts, and is designed to detect small changes in capacitance caused by changes to the state of the vocal cords - the glottal waveform. Although preliminary simulations in [BLP+02] have validated the basic theory governing the TERC sensor's operation, results from human subject testing are necessary to accurately characterize the sensor's performance in practice. To this end, a system was designed and developed to provide real-time audio recordings from the sensor while attached to a human test subject. From these recordings, executed in a variety of acoustic noise environments, the practical functionality of the TERC sensor was demonstrated. The sensor in its current evolution is able to detect a periodic waveform during voiced speech, with two clear harmonics and a fundamental frequency equal to that of the speech it is detecting. This waveform is representative of the glottal waveform, with little or no articulation as initially hypothesized. Though statistically significant conclusions about the sensor's immunity to environmental noise are difficult to draw, the results suggest that the TERC sensor is considerably more resistant to the effects of noise than typical acoustic sensors, making it a valuable addition to the multi-sensor speech processing approach.
|
283 |
Methods of endpoint detection for isolated word recognitionLamel, Lori Faith January 1980 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographical references. / by Lori F. Lamel. / M.S.
|
284 |
Deception in Spoken Dialogue: Classification and Individual DifferencesLevitan, Sarah Ita January 2019 (has links)
Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans.
This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale.
In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance.
The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech.
The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection.
|
285 |
Robust methods for Chinese spoken document retrieval.January 2003 (has links)
Hui Pui Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 158-169). / Abstracts in English and Chinese. / Abstract --- p.2 / Acknowledgements --- p.6 / Chapter 1 --- Introduction --- p.23 / Chapter 1.1 --- Spoken Document Retrieval --- p.24 / Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28 / Chapter 1.3 --- Motivation --- p.33 / Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34 / Chapter 1.4 --- Goals --- p.34 / Chapter 1.5 --- Thesis Organization --- p.35 / Chapter 2 --- Multimedia Repository --- p.37 / Chapter 2.1 --- The Cantonese Corpus --- p.37 / Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39 / Chapter 2.1.2 --- The MPEG-1 Collection --- p.40 / Chapter 2.2 --- The Multimedia Markup Language --- p.42 / Chapter 2.3 --- Chapter Summary --- p.44 / Chapter 3 --- Monolingual Retrieval Task --- p.45 / Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45 / Chapter 3.2 --- Automatic Speech Transcription --- p.46 / Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47 / Chapter 3.2.2 --- Indexing Units --- p.48 / Chapter 3.3 --- Known-Item Retrieval Task --- p.49 / Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50 / Chapter 3.4 --- Retrieval Model --- p.51 / Chapter 3.5 --- Experimental Results --- p.52 / Chapter 3.6 --- Chapter Summary --- p.53 / Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55 / Chapter 4.1 --- Video-based Segmentation --- p.56 / Chapter 4.1.1 --- Metric Computation --- p.57 / Chapter 4.1.2 --- Shot Boundary Detection --- p.58 / Chapter 4.1.3 --- Shot Transition Detection --- p.67 / Chapter 4.2 --- Audio-based Segmentation --- p.69 / Chapter 4.2.1 --- Gaussian Mixture Models --- p.69 / Chapter 4.2.2 --- Transition Detection --- p.70 / Chapter 4.3 --- Performance Evaluation --- p.72 / Chapter 4.3.1 --- Automatic Story Segmentation --- p.72 / Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73 / Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74 / Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75 / Chapter 4.5 --- Retrieval Performance --- p.76 / Chapter 4.6 --- Chapter Summary --- p.78 / Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79 / Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81 / Chapter 5.1.1 --- Annotations from MmML --- p.81 / Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83 / Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84 / Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84 / Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87 / Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90 / Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90 / Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92 / Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92 / Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97 / Chapter 6.1 --- The TDT-2 Corpus --- p.99 / Chapter 6.1.1 --- English Textual Queries --- p.100 / Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101 / Chapter 6.2 --- Query Processing --- p.101 / Chapter 6.2.1 --- Query Weighting --- p.101 / Chapter 6.2.2 --- Bigram Formation --- p.102 / Chapter 6.3 --- Cross-language Retrieval Task --- p.103 / Chapter 6.3.1 --- Indexing Units --- p.104 / Chapter 6.3.2 --- Retrieval Model --- p.104 / Chapter 6.3.3 --- Performance Measure --- p.105 / Chapter 6.4 --- Relevance Feedback --- p.106 / Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107 / Chapter 6.5 --- Retrieval Performance --- p.107 / Chapter 6.6 --- Chapter Summary --- p.109 / Chapter 7 --- Conclusions and Future Work --- p.111 / Chapter 7.1 --- Future Work --- p.114 / Chapter A --- XML Schema for Multimedia Markup Language --- p.117 / Chapter B --- Example of Multimedia Markup Language --- p.128 / Chapter C --- Significance Tests --- p.135 / Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135 / Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137 / Chapter C.3 --- Document Expansion with Reporter Speech --- p.137 / Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140 / Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140 / Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142 / Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145 / Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148 / Chapter F --- Parameters Estimation --- p.152 / Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152 / Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153 / Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153 / Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154 / Chapter G --- Abbreviations --- p.155 / Bibliography --- p.158
|
286 |
Automatic speech recognition of Cantonese-English code-mixing utterances.January 2005 (has links)
Chan Yeuk Chi Joyce. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Previous Work on Code-switching Speech Recognition --- p.2 / Chapter 1.2.1 --- Keyword Spotting Approach --- p.3 / Chapter 1.2.2 --- Translation Approach --- p.4 / Chapter 1.2.3 --- Language Boundary Detection --- p.6 / Chapter 1.3 --- Motivations of Our Work --- p.7 / Chapter 1.4 --- Methodology --- p.8 / Chapter 1.5 --- Thesis Outline --- p.10 / Chapter 1.6 --- References --- p.11 / Chapter Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese and English --- p.14 / Chapter 2.1 --- Basic Theory of Speech Recognition --- p.14 / Chapter 2.1.1 --- Feature Extraction --- p.14 / Chapter 2.1.2 --- Maximum a Posteriori (MAP) Probability --- p.15 / Chapter 2.1.3 --- Hidden Markov Model (HMM) --- p.16 / Chapter 2.1.4 --- Statistical Language Modeling --- p.17 / Chapter 2.1.5 --- Search A lgorithm --- p.18 / Chapter 2.2 --- Word Posterior Probability (WPP) --- p.19 / Chapter 2.3 --- Generalized Word Posterior Probability (GWPP) --- p.23 / Chapter 2.4 --- Characteristics of Cantonese --- p.24 / Chapter 2.4.1 --- Cantonese Phonology --- p.24 / Chapter 2.4.2 --- Variation and Change in Pronunciation --- p.27 / Chapter 2.4.3 --- Syllables and Characters in Cantonese --- p.28 / Chapter 2.4.4 --- Spoken Cantonese vs. Written Chinese --- p.28 / Chapter 2.5 --- Characteristics of English --- p.30 / Chapter 2.5.1 --- English Phonology --- p.30 / Chapter 2.5.2 --- English with Cantonese Accents --- p.31 / Chapter 2.6 --- References --- p.32 / Chapter Chapter 3 --- Code-mixing and Code-switching Speech Recognition --- p.35 / Chapter 3.1 --- Introduction --- p.35 / Chapter 3.2 --- Definition --- p.35 / Chapter 3.2.1 --- Monolingual Speech Recognition --- p.35 / Chapter 3.2.2 --- Multilingual Speech Recognition --- p.35 / Chapter 3.2.3 --- Code-mixing and Code-switching --- p.36 / Chapter 3.3 --- Conversation in Hong Kong --- p.38 / Chapter 3.3.1 --- Language Choice of Hong Kong People --- p.38 / Chapter 3.3.2 --- Reasons for Code-mixing in Hong Kong --- p.40 / Chapter 3.3.3 --- How Does Code-mixing Occur? --- p.41 / Chapter 3.4 --- Difficulties for Code-mixing - Specific to Cantonese-English --- p.44 / Chapter 3.4.1 --- Phonetic Differences --- p.45 / Chapter 3.4.2 --- Phonology difference --- p.48 / Chapter 3.4.3 --- Accent and Borrowing --- p.49 / Chapter 3.4.4 --- Lexicon and Grammar --- p.49 / Chapter 3.4.5 --- Lack of Appropriate Speech Corpus --- p.50 / Chapter 3.5 --- References --- p.50 / Chapter Chapter 4 --- Data Collection --- p.53 / Chapter 4.1 --- Data Collection --- p.53 / Chapter 4.1.1 --- Corpus Design --- p.53 / Chapter 4.1.2 --- Recording Setup --- p.59 / Chapter 4.1.3 --- Post-processing of Speech Data --- p.60 / Chapter 4.2 --- A Baseline Database --- p.61 / Chapter 4.2.1 --- Monolingual Spoken Cantonese Speech Data (CUMIX) --- p.61 / Chapter 4.3 --- References --- p.61 / Chapter Chapter 5 --- System Design and Experimental Setup --- p.63 / Chapter 5.1 --- Overview of the Code-mixing Speech Recognizer --- p.63 / Chapter 5.1.1 --- Bilingual Syllable / Word-based Speech Recognizer --- p.63 / Chapter 5.1.2 --- Language Boundary Detection --- p.64 / Chapter 5.1.3 --- Generalized Word Posterior Probability (GWPP) --- p.65 / Chapter 5.2 --- Acoustic Modeling --- p.66 / Chapter 5.2.1 --- Speech Corpus for Training of Acoustic Models --- p.67 / Chapter 5.2.2 --- Features Extraction --- p.69 / Chapter 5.2.3 --- Variability in the Speech Signal --- p.69 / Chapter 5.2.4 --- Language Dependency of the Acoustic Models --- p.71 / Chapter 5.2.5 --- Pronunciation Dictionary --- p.80 / Chapter 5.2.6 --- The Training Process of Acoustic Models --- p.83 / Chapter 5.2.7 --- Decoding and Evaluation --- p.88 / Chapter 5.3 --- Language Modeling --- p.90 / Chapter 5.3.1 --- N-gram Language Model --- p.91 / Chapter 5.3.2 --- Difficulties in Data Collection --- p.91 / Chapter 5.3.3 --- Text Data for Training Language Model --- p.92 / Chapter 5.3.4 --- Training Tools --- p.95 / Chapter 5.3.5 --- Training Procedure --- p.95 / Chapter 5.3.6 --- Evaluation of the Language Models --- p.98 / Chapter 5.4 --- Language Boundary Detection --- p.99 / Chapter 5.4.1 --- Phone-based LBD --- p.100 / Chapter 5.4.2 --- Syllable-based LBD --- p.104 / Chapter 5.4.3 --- LBD Based on Syllable Lattice --- p.106 / Chapter 5.5 --- "Integration of the Acoustic Model Scores, Language Model Scores and Language Boundary Information" --- p.107 / Chapter 5.5.1 --- Integration of Acoustic Model Scores and Language Boundary Information. --- p.107 / Chapter 5.5.2 --- Integration of Modified Acoustic Model Scores and Language Model Scores --- p.109 / Chapter 5.5.3 --- Evaluation Criterion --- p.111 / Chapter 5.6 --- References --- p.112 / Chapter Chapter 6 --- Results and Analysis --- p.118 / Chapter 6.1 --- Speech Data for Development and Evaluation --- p.118 / Chapter 6.1.1 --- Development Data --- p.118 / Chapter 6.1.2 --- Testing Data --- p.118 / Chapter 6.2 --- Performance of Different Acoustic Units --- p.119 / Chapter 6.2.1 --- Analysis of Results --- p.120 / Chapter 6.3 --- Language Boundary Detection --- p.122 / Chapter 6.3.1 --- Phone-based Language Boundary Detection --- p.123 / Chapter 6.3.2 --- Syllable-based Language Boundary Detection (SYL LB) --- p.127 / Chapter 6.3.3 --- Language Boundary Detection Based on Syllable Lattice (BILINGUAL LBD) --- p.129 / Chapter 6.3.4 --- Observations --- p.129 / Chapter 6.4 --- Evaluation of the Language Models --- p.130 / Chapter 6.4.1 --- Character Perplexity --- p.130 / Chapter 6.4.2 --- Phonetic-to-text Conversion Rate --- p.131 / Chapter 6.4.3 --- Observations --- p.131 / Chapter 6.5 --- Character Error Rate --- p.132 / Chapter 6.5.1 --- Without Language Boundary Information --- p.133 / Chapter 6.5.2 --- With Language Boundary Detector SYL LBD --- p.134 / Chapter 6.5.3 --- With Language Boundary Detector BILINGUAL-LBD --- p.136 / Chapter 6.5.4 --- Observations --- p.138 / Chapter 6.6 --- References --- p.141 / Chapter Chapter 7 --- Conclusions and Suggestions for Future Work --- p.143 / Chapter 7.1 --- Conclusion --- p.143 / Chapter 7.1.1 --- Difficulties and Solutions --- p.144 / Chapter 7.2 --- Suggestions for Future Work --- p.149 / Chapter 7.2.1 --- Acoustic Modeling --- p.149 / Chapter 7.2.2 --- Pronunciation Modeling --- p.149 / Chapter 7.2.3 --- Language Modeling --- p.150 / Chapter 7.2.4 --- Speech Data --- p.150 / Chapter 7.2.5 --- Language Boundary Detection --- p.151 / Chapter 7.3 --- References --- p.151 / Appendix A Code-mixing Utterances in Training Set of CUMIX --- p.152 / Appendix B Code-mixing Utterances in Testing Set of CUMIX --- p.175 / Appendix C Usage of Speech Data in CUMIX --- p.202
|
287 |
Using duration information in HMM-based automatic speech recognition.January 2005 (has links)
Zhu Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 100-104). / Abstracts in English and Chinese. / Chapter CHAPTER 1 --- lNTRODUCTION --- p.1 / Chapter 1.1. --- Speech and its temporal structure --- p.1 / Chapter 1.2. --- Previous work on the modeling of temporal structure --- p.1 / Chapter 1.3. --- Integrating explicit duration modeling in HMM-based ASR system --- p.3 / Chapter 1.4. --- Thesis outline --- p.3 / Chapter CHAPTER 2 --- BACKGROUND --- p.5 / Chapter 2.1. --- Automatic speech recognition process --- p.5 / Chapter 2.2. --- HMM for ASR --- p.6 / Chapter 2.2.1. --- HMM for ASR --- p.6 / Chapter 2.2.2. --- HMM-based ASR system --- p.7 / Chapter 2.3. --- General approaches to explicit duration modeling --- p.12 / Chapter 2.3.1. --- Explicit duration modeling --- p.13 / Chapter 2.3.2. --- Training of duration model --- p.16 / Chapter 2.3.3. --- Incorporation of duration model in decoding --- p.18 / Chapter CHAPTER 3 --- CANTONESE CONNECTD-DlGlT RECOGNITION --- p.21 / Chapter 3.1. --- Cantonese connected digit recognition --- p.21 / Chapter 3.1.1. --- Phonetics of Cantonese and Cantonese digit --- p.21 / Chapter 3.2. --- The baseline system --- p.24 / Chapter 3.2.1. --- Speech corpus --- p.24 / Chapter 3.2.2. --- Feature extraction --- p.25 / Chapter 3.2.3. --- HMM models --- p.26 / Chapter 3.2.4. --- HMM decoding --- p.27 / Chapter 3.3. --- Baseline performance and error analysis --- p.27 / Chapter 3.3.1. --- Recognition performance --- p.27 / Chapter 3.3.2. --- Performance for different speaking rates --- p.28 / Chapter 3.3.3. --- Confusion matrix --- p.30 / Chapter CHAPTER 4 --- DURATION MODELING FOR CANTONESE DIGITS --- p.41 / Chapter 4.1. --- Duration features --- p.41 / Chapter 4.1.1. --- Absolute duration feature --- p.41 / Chapter 4.1.2. --- Relative duration feature --- p.44 / Chapter 4.2. --- Parametric distribution for duration modeling --- p.47 / Chapter 4.3. --- Estimation of the model parameters --- p.51 / Chapter 4.4. --- Speaking-rate-dependent duration model --- p.52 / Chapter CHAPTER 5 --- USING DURATION MODELING FOR CANTONSE DIGIT RECOGNITION --- p.57 / Chapter 5.1. --- Baseline decoder --- p.57 / Chapter 5.2. --- Incorporation of state-level duration model --- p.59 / Chapter 5.3. --- Incorporation word-level duration model --- p.62 / Chapter 5.4. --- Weighted use of duration model --- p.65 / Chapter CHAPTER 6 --- EXPERIMENT RESULT AND ANALYSIS --- p.66 / Chapter 6.1. --- Experiments with speaking-rate-independent duration models --- p.66 / Chapter 6.1.1. --- Discussion --- p.68 / Chapter 6.1.2. --- Analysis of the error patterns --- p.71 / Chapter 6.1.3. --- "Reduction of deletion, substitution and insertion" --- p.72 / Chapter 6.1.4. --- Recognition performance at different speaking rates --- p.75 / Chapter 6.2. --- Experiments with speaking-rate-dependent duration models --- p.77 / Chapter 6.2.1. --- Using true speaking rate --- p.77 / Chapter 6.2.2. --- Using estimated speaking rate --- p.79 / Chapter 6.3. --- Evaluation on another speech database --- p.80 / Chapter 6.3.1. --- Experimental setup --- p.80 / Chapter 6.3.2. --- Experiment results and analysis --- p.82 / Chapter CHAPTER 7 --- CONCLUSIONS AND FUTUR WORK --- p.87 / Chapter 7.1. --- Conclusion and understanding of current work --- p.87 / Chapter 7.2. --- Future work --- p.89 / Chapter A --- APPENDIX --- p.90 / BIBLIOGRAPHY --- p.100
|
288 |
An error detection and correction framework to improve large vocabulary continuous speech recognition. / CUHK electronic theses & dissertations collectionJanuary 2009 (has links)
In addition to the ED-EC framework, this thesis proposes a discriminative lattice rescoring (DLR) algorithm to facilitate the investigation of the extensibility of the framework. The DLR method recasts a discriminative n-gram model as a pseudo-conventional n-gram model and then uses this recast model to perform lattice rescoring. DLR improves the efficiency of discriminative n-gram modeling and facilitates combined processing of discriminative n-gram modeling with other post-processing techniques such as the ED-EC framework. / This thesis proposes an error detection and correction (ED-EC) framework to incorporate advanced linguistic knowledge sources into large vocabulary continuous speech recognition. Previous efforts that apply sophisticated language models (LMs) in speech recognition normally face a serious efficiency problem due to the intense computation required by these models. The ED-EC framework aims to achieve the full benefit of complex linguistic sources while at the same time maximize efficiency. The framework attempts to only apply computationally expensive LMs where needed in input speech. First, the framework detects recognition errors in the output of an efficient state-of-the-art decoding procedure. Then, it corrects the detected errors with the aid of sophisticated LMs by (1) creating alternatives for each detected error and (2) applying advanced models to distinguish among the alternatives. In this thesis, we implement a prototype of the ED-EC framework on the task of Mandarin dictation. This prototype detects recognition errors based on generalized word posterior probabilities, selects alternatives for errors from recognition lattices generated during decoding and adopts an advanced LM that combines mutual information, word trigrams and POS trigrams. The experimental results indicate the practical feasibility of the ED-EC framework, for which the optimal gain of the focused LM is theoretically achievable at low computational cost. On a general-domain test set, a 6.0% relative reduction in character error rate (CER) over the performance of a state-of-the-art baseline recognizer is obtained. In terms of efficiency, while both the detection of errors and the creation of alternatives are efficient, the application of the computationally expensive LM is concentrated on less than 50% of the utterances. We further demonstrate that the potential benefit of using the ED-EC framework in improving the recognition performance is tremendous. If error detection is perfect and alternatives for an error are guaranteed to include the correct one, the relative CER reduction over the baseline performance will increase to 36.0%. We also illustrate that the ED-EC framework is robust on unseen data and can be conveniently extended to other recognition systems. / Zhou, Zhengyu. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 72-11, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 142-155). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
289 |
A speech recognition IC with an efficient MFCC extraction algorithm and multi-mixture models. / CUHK electronic theses & dissertations collectionJanuary 2006 (has links)
Automatic speech recognition (ASR) by machine has received a great deal of attention in past decades. Speech recognition algorithms based on the Mel frequency cepstrum coefficient (MFCC) and the hidden Markov model (HMM) have a better recognition performance compared with other speech recognition algorithms and are widely used in many applications. In this thesis a speech recognition system with an efficient MFCC extraction algorithm and multi-mixture models is presented. It is composed of two parts: a MFCC feature extractor and a HMM-based speech decoder. / For the HMM-based decoder of the speech recognition system, it is advantageous to use models with multi mixtures, but with more mixtures the calculation becomes more complicated. Using a table look-up method proposed in this thesis the new design can handle up to 16 states and 8 mixtures. This new design can be easily extended to handle models which have more states and mixtures. We have implemented the new algorithm with an Altera FPGA chip using fix-point calculation and tested the FPGA chip with the speech data from the AURORA 2 database, which is a well known database designed to evaluate the performance of speech recognition algorithms in noisy conditions [27]. The recognition accuracy of the new system is 91.01%. A conventional software recognition system running on PC using 32-bit floating point calculation has a recognition accuracy of 94.65%. / In the conventional MFCC feature extraction algorithm, speech is separated into some short overlapped frames. The existing extraction algorithm requires a lot of computations and is not suitable for hardware implementation. We have developed a hardware efficient MFCC feature extraction algorithm in our work. The new algorithm reduces the computational power by 54% compared to the conventional algorithm with only 1.7% reduction in recognition accuracy. / Han Wei. / "September 2006." / Adviser: Cheong Fat Chan. / Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1823. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 108-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
290 |
efficient decoding method for continuous speech recognition based on a tree-structured lexicon =: 基於樹狀詞彙表示方法的有效率連續語音識別系統. / 基於樹狀詞彙表示方法的有效率連續語音識別系統 / An efficient decoding method for continuous speech recognition based on a tree-structured lexicon =: Ji yu shu zhuang ci hui biao shi fang fa de you xiao lü lian xu yu yin shi bie xi tong. / Ji yu shu zhuang ci hui biao shi fang fa de you xiao lü lian xu yu yin shi bie xi tongJanuary 2001 (has links)
Choi Wing Nin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Choi Wing Nin. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Development of search algorithms for Chinese LVCSR --- p.3 / Chapter 1.2 --- Objectives of the thesis --- p.4 / Chapter 1.3 --- Thesis outline --- p.5 / Reference --- p.7 / Chapter 2 --- Fundamentals of Continuous Speech Recognition --- p.9 / Chapter 2.1 --- The Bayesian decision rule --- p.9 / Chapter 2.2 --- Acoustic front-end processor --- p.11 / Chapter 2.3 --- Phonological constraint --- p.12 / Chapter 2.3.1 --- Characteristics of Cantonese --- p.12 / Chapter 2.3.2 --- Homophones and homographs --- p.13 / Chapter 2.4 --- Acoustic modeling --- p.13 / Chapter 2.5 --- Statistical language model --- p.15 / Chapter 2.5.1 --- Word-based language model --- p.15 / Chapter 2.5.2 --- Class-based language model --- p.16 / Chapter 2.6 --- Search algorithms --- p.17 / Chapter 2.6.1 --- Time-synchronous Viterbi search --- p.18 / Chapter 2.6.2 --- Time-asynchronous stack decoding --- p.18 / Chapter 2.6.3 --- One-pass versus multi-pass search strategies --- p.19 / Chapter 2.7 --- Summary --- p.20 / Reference --- p.21 / Chapter 3 --- Search Space Organization --- p.23 / Chapter 3.1 --- Lexicon representation --- p.24 / Chapter 3.1.1 --- Linear lexicon --- p.25 / Chapter 3.1.2 --- Tree lexicon --- p.27 / Chapter 3.2 --- Factorization of language model --- p.31 / Chapter 3.3 --- Lexical tree incorporated with context-dependent acoustic models --- p.36 / Chapter 3.4 --- Summary --- p.39 / Reference --- p.40 / Chapter 4 --- One-Pass Dynamic Programming Based Search Algorithm --- p.42 / Chapter 4.1 --- Token Passing Algorithm --- p.43 / Chapter 4.2 --- Techniques for speeding up the search --- p.48 / Chapter 4.2.1 --- Different layers of beam in the search hierarchy --- p.48 / Chapter 4.2.2 --- Efficient recombination of tokens --- p.51 / Chapter 4.2.3 --- Fast likelihood computation methods for continuous mixture densities --- p.52 / Chapter 4.2.4 --- Lexical tree with class-based language model --- p.54 / Chapter 4.3 --- Experimental results and discussions --- p.57 / Chapter 4.3.1 --- The Hong Kong stock inquiry task --- p.57 / Chapter 4.3.2 --- General domain continuous speech recognition --- p.59 / Reference --- p.62 / Chapter 5 --- Extension of the One-Pass Search --- p.64 / Chapter 5.1 --- Overview of the extended framework --- p.65 / Chapter 5.2 --- Word lattice construction by modified word-conditioned search --- p.66 / Chapter 5.2.1 --- Exact N-best algorithm --- p.66 / Chapter 5.2.2 --- Word-pair approximation --- p.67 / Chapter 5.2.3 --- Word lattice algorithm --- p.68 / Chapter 5.3 --- Computation of heuristic score --- p.70 / Chapter 5.4 --- Backward A* heuristic search --- p.72 / Chapter 5.4.1 --- Recovering the missing piece --- p.74 / Chapter 5.4.2 --- Generation of N-best list --- p.74 / Chapter 5.5 --- Experimental results --- p.75 / Chapter 5.5.1 --- Simple back-tracking vs A* heuristic search --- p.75 / Chapter 5.5.2 --- N-best list evaluation using class bigram re-scoring --- p.76 / Chapter 5.5.3 --- N-best list evaluation using class trigram re-scoring --- p.77 / Chapter 5.6 --- Summary --- p.78 / Reference --- p.79 / Chapter 6 --- Conclusions and Suggestions for Future Development --- p.80 / Chapter 6.1 --- Conclusions --- p.80 / Chapter 6.2 --- Suggestions for future development --- p.82 / Chapter 6.2.1 --- Incorporation of tone information --- p.82 / Chapter 6.2.2 --- Fast match strategy for acoustic models --- p.82 / Reference --- p.84 / Appendix Cantonese Initials and Finals --- p.85
|
Page generated in 0.1165 seconds