Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
431 |
Development of a Field-Deployable Voice-Controlled Ultrasound Scanner SystemSebastian, Dalys 25 June 2004 (has links)
"Modern ultrasound scanners are portable and have become very useful for clinical diagnosis. However, they have limitations for field use purposes, primarily because they occupy both hands of the physician who performs the scanning. The goal of this thesis is to develop a wearable voice-controlled ultrasound scanner system that would enable the physician to provide a fast and efficient diagnosis. This is expected to become very useful for emergency and trauma applications. A commercially available ultrasound scanner system, Terason 2000, was chosen as the basis for development. This system consists of a laptop, a hardware unit containing the RF beamforming and signal processing chips and the ultrasound transducer. In its commercial version, the control of the ultrasound system is performed via a Graphical User Interface with a Windows-application look and feel. In the system we developed, a command and control speech recognition engine and a noise-canceling microphone are selected to control the scanner using voice commands. A mini-joystick is attached to the top of the ultrasound transducer for distance and area measurements and to perform zooming of the ultrasound images. An eye-wear viewer connected to the laptop enables the user to view the ultrasound images directly. Power management features are incorporated into the ultrasound system in order to conserve the battery power. A wireless connection is set up with a remote laptop to enable real-time transmission of wireless images. The result is a truly untethered, voice-controlled, ultrasound system enclosed in a backpack and monitored by the eye-wear viewer. (In the second generation of this system, the laptop is replaced by an embedded PC and is incorporated into a photographer’s vest). The voice-controlled system has to be made reliable under various forms of background noise. Three command and control speech recognition systems were selected and their recognition performances were determined under different types and levels of ambient noise. The variation of recognition rates was also analyzed over 6 different speakers. A detailed testing was also conducted to identify the ideal combination of a microphone and speech recognition engine suitable for the ultrasound scanner system. Six different microphones, each with their own unique methods of implementing noise cancellation features, were chosen as candidates for this analysis. The testing was conducted by making recordings inside a highly reverberant acoustic noisy chamber, and the recordings were fed to the automatic speech recognition engines offline for performance evaluation. The speech recognition engine and microphone selected as a result of this extensive testing were then incorporated into the wearable ultrasound scanner system. This thesis also discusses the implementation of the human-speech interface, which also plays a major role in the effectiveness of the voice-controlled ultrasound scanner system."
|
432 |
Methods of endpoint detection for isolated word recognitionLamel, Lori Faith January 1980 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographical references. / by Lori F. Lamel. / M.S.
|
433 |
Deception in Spoken Dialogue: Classification and Individual DifferencesLevitan, Sarah Ita January 2019 (has links)
Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans.
This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale.
In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance.
The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech.
The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection.
|
434 |
Sistemas de adaptação ao locutor utilizando autovozes. / Speaker adaptation system using eigenvoices.Borges, Liselene de Abreu 20 December 2001 (has links)
O presente trabalho descreve duas técnicas de adaptação ao locutor para sistemas de reconhecimento de voz utilizando um volume de dados de adaptação reduzido. Regressão Linear de Máxima Verossimilhança (MLLR) e Autovozes são as técnicas trabalhadas. Ambas atualizam as médias das Gaussianas dos modelos ocultos de Markov (HMM). A técnica MLLR estima um grupo de transformações lineares para os parâmetros das medias das Gaussianas do sistema. A técnica de Autovozes baseia-se no conhecimento prévio das variações entre locutores. Para obtermos o conhecimento prévio, que está contido nas autovozes, utiliza-se a análise em componentes principais (PCA). Fizemos os testes de adaptação das médias em um sistema de reconhecimento de voz de palavras isoladas e de vocabulário restrito. Contando com um volume grande de dados de adaptação (mais de 70% das palavras do vocabulário) a técnica de autovozes não apresentou resultados expressivos com relação aos que a técnica MLLR apresentou. Agora, quando o volume de dados reduzido (menos de 15% das palavras do vocabulário) a técnica de Autovozes apresentou-se superior à MLLR. / This present work describe two speaker adaptation technique, using a small amount of adaptation data, for a speech recognition system. These techniques are Maximum Likelihood Linear Regression (MLLR) and Eigenvoices. Both re-estimates the mean of a continuous density Hidden Markov Model system. MLLR technique estimates a set of linear transformations for mean parameters of a Gaussian system. The eigenvoice technique is based on a previous knowledge about speaker variation. For obtaining this previous knowledge, that are retained in eigenvoices, it necessary to apply principal component analysis (PCA). We make adaptation tests over an isolated word recognition system, restrict vocabulary. If a large amount of adaptation data is available (up to 70% of all vocabulary) Eigenvoices technique does not appear to be a good implementation if compared with the MLLR technique. Now, when just a small amount of adaptation data is available (less than 15 % of all vocabulary), Eigenvoices technique get better results than MLLR technique.
|
435 |
Robust methods for Chinese spoken document retrieval.January 2003 (has links)
Hui Pui Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 158-169). / Abstracts in English and Chinese. / Abstract --- p.2 / Acknowledgements --- p.6 / Chapter 1 --- Introduction --- p.23 / Chapter 1.1 --- Spoken Document Retrieval --- p.24 / Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28 / Chapter 1.3 --- Motivation --- p.33 / Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34 / Chapter 1.4 --- Goals --- p.34 / Chapter 1.5 --- Thesis Organization --- p.35 / Chapter 2 --- Multimedia Repository --- p.37 / Chapter 2.1 --- The Cantonese Corpus --- p.37 / Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39 / Chapter 2.1.2 --- The MPEG-1 Collection --- p.40 / Chapter 2.2 --- The Multimedia Markup Language --- p.42 / Chapter 2.3 --- Chapter Summary --- p.44 / Chapter 3 --- Monolingual Retrieval Task --- p.45 / Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45 / Chapter 3.2 --- Automatic Speech Transcription --- p.46 / Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47 / Chapter 3.2.2 --- Indexing Units --- p.48 / Chapter 3.3 --- Known-Item Retrieval Task --- p.49 / Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50 / Chapter 3.4 --- Retrieval Model --- p.51 / Chapter 3.5 --- Experimental Results --- p.52 / Chapter 3.6 --- Chapter Summary --- p.53 / Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55 / Chapter 4.1 --- Video-based Segmentation --- p.56 / Chapter 4.1.1 --- Metric Computation --- p.57 / Chapter 4.1.2 --- Shot Boundary Detection --- p.58 / Chapter 4.1.3 --- Shot Transition Detection --- p.67 / Chapter 4.2 --- Audio-based Segmentation --- p.69 / Chapter 4.2.1 --- Gaussian Mixture Models --- p.69 / Chapter 4.2.2 --- Transition Detection --- p.70 / Chapter 4.3 --- Performance Evaluation --- p.72 / Chapter 4.3.1 --- Automatic Story Segmentation --- p.72 / Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73 / Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74 / Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75 / Chapter 4.5 --- Retrieval Performance --- p.76 / Chapter 4.6 --- Chapter Summary --- p.78 / Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79 / Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81 / Chapter 5.1.1 --- Annotations from MmML --- p.81 / Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83 / Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84 / Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84 / Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87 / Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90 / Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90 / Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92 / Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92 / Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97 / Chapter 6.1 --- The TDT-2 Corpus --- p.99 / Chapter 6.1.1 --- English Textual Queries --- p.100 / Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101 / Chapter 6.2 --- Query Processing --- p.101 / Chapter 6.2.1 --- Query Weighting --- p.101 / Chapter 6.2.2 --- Bigram Formation --- p.102 / Chapter 6.3 --- Cross-language Retrieval Task --- p.103 / Chapter 6.3.1 --- Indexing Units --- p.104 / Chapter 6.3.2 --- Retrieval Model --- p.104 / Chapter 6.3.3 --- Performance Measure --- p.105 / Chapter 6.4 --- Relevance Feedback --- p.106 / Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107 / Chapter 6.5 --- Retrieval Performance --- p.107 / Chapter 6.6 --- Chapter Summary --- p.109 / Chapter 7 --- Conclusions and Future Work --- p.111 / Chapter 7.1 --- Future Work --- p.114 / Chapter A --- XML Schema for Multimedia Markup Language --- p.117 / Chapter B --- Example of Multimedia Markup Language --- p.128 / Chapter C --- Significance Tests --- p.135 / Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135 / Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137 / Chapter C.3 --- Document Expansion with Reporter Speech --- p.137 / Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140 / Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140 / Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142 / Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145 / Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148 / Chapter F --- Parameters Estimation --- p.152 / Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152 / Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153 / Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153 / Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154 / Chapter G --- Abbreviations --- p.155 / Bibliography --- p.158
|
436 |
Automatic speech recognition of Cantonese-English code-mixing utterances.January 2005 (has links)
Chan Yeuk Chi Joyce. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Previous Work on Code-switching Speech Recognition --- p.2 / Chapter 1.2.1 --- Keyword Spotting Approach --- p.3 / Chapter 1.2.2 --- Translation Approach --- p.4 / Chapter 1.2.3 --- Language Boundary Detection --- p.6 / Chapter 1.3 --- Motivations of Our Work --- p.7 / Chapter 1.4 --- Methodology --- p.8 / Chapter 1.5 --- Thesis Outline --- p.10 / Chapter 1.6 --- References --- p.11 / Chapter Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese and English --- p.14 / Chapter 2.1 --- Basic Theory of Speech Recognition --- p.14 / Chapter 2.1.1 --- Feature Extraction --- p.14 / Chapter 2.1.2 --- Maximum a Posteriori (MAP) Probability --- p.15 / Chapter 2.1.3 --- Hidden Markov Model (HMM) --- p.16 / Chapter 2.1.4 --- Statistical Language Modeling --- p.17 / Chapter 2.1.5 --- Search A lgorithm --- p.18 / Chapter 2.2 --- Word Posterior Probability (WPP) --- p.19 / Chapter 2.3 --- Generalized Word Posterior Probability (GWPP) --- p.23 / Chapter 2.4 --- Characteristics of Cantonese --- p.24 / Chapter 2.4.1 --- Cantonese Phonology --- p.24 / Chapter 2.4.2 --- Variation and Change in Pronunciation --- p.27 / Chapter 2.4.3 --- Syllables and Characters in Cantonese --- p.28 / Chapter 2.4.4 --- Spoken Cantonese vs. Written Chinese --- p.28 / Chapter 2.5 --- Characteristics of English --- p.30 / Chapter 2.5.1 --- English Phonology --- p.30 / Chapter 2.5.2 --- English with Cantonese Accents --- p.31 / Chapter 2.6 --- References --- p.32 / Chapter Chapter 3 --- Code-mixing and Code-switching Speech Recognition --- p.35 / Chapter 3.1 --- Introduction --- p.35 / Chapter 3.2 --- Definition --- p.35 / Chapter 3.2.1 --- Monolingual Speech Recognition --- p.35 / Chapter 3.2.2 --- Multilingual Speech Recognition --- p.35 / Chapter 3.2.3 --- Code-mixing and Code-switching --- p.36 / Chapter 3.3 --- Conversation in Hong Kong --- p.38 / Chapter 3.3.1 --- Language Choice of Hong Kong People --- p.38 / Chapter 3.3.2 --- Reasons for Code-mixing in Hong Kong --- p.40 / Chapter 3.3.3 --- How Does Code-mixing Occur? --- p.41 / Chapter 3.4 --- Difficulties for Code-mixing - Specific to Cantonese-English --- p.44 / Chapter 3.4.1 --- Phonetic Differences --- p.45 / Chapter 3.4.2 --- Phonology difference --- p.48 / Chapter 3.4.3 --- Accent and Borrowing --- p.49 / Chapter 3.4.4 --- Lexicon and Grammar --- p.49 / Chapter 3.4.5 --- Lack of Appropriate Speech Corpus --- p.50 / Chapter 3.5 --- References --- p.50 / Chapter Chapter 4 --- Data Collection --- p.53 / Chapter 4.1 --- Data Collection --- p.53 / Chapter 4.1.1 --- Corpus Design --- p.53 / Chapter 4.1.2 --- Recording Setup --- p.59 / Chapter 4.1.3 --- Post-processing of Speech Data --- p.60 / Chapter 4.2 --- A Baseline Database --- p.61 / Chapter 4.2.1 --- Monolingual Spoken Cantonese Speech Data (CUMIX) --- p.61 / Chapter 4.3 --- References --- p.61 / Chapter Chapter 5 --- System Design and Experimental Setup --- p.63 / Chapter 5.1 --- Overview of the Code-mixing Speech Recognizer --- p.63 / Chapter 5.1.1 --- Bilingual Syllable / Word-based Speech Recognizer --- p.63 / Chapter 5.1.2 --- Language Boundary Detection --- p.64 / Chapter 5.1.3 --- Generalized Word Posterior Probability (GWPP) --- p.65 / Chapter 5.2 --- Acoustic Modeling --- p.66 / Chapter 5.2.1 --- Speech Corpus for Training of Acoustic Models --- p.67 / Chapter 5.2.2 --- Features Extraction --- p.69 / Chapter 5.2.3 --- Variability in the Speech Signal --- p.69 / Chapter 5.2.4 --- Language Dependency of the Acoustic Models --- p.71 / Chapter 5.2.5 --- Pronunciation Dictionary --- p.80 / Chapter 5.2.6 --- The Training Process of Acoustic Models --- p.83 / Chapter 5.2.7 --- Decoding and Evaluation --- p.88 / Chapter 5.3 --- Language Modeling --- p.90 / Chapter 5.3.1 --- N-gram Language Model --- p.91 / Chapter 5.3.2 --- Difficulties in Data Collection --- p.91 / Chapter 5.3.3 --- Text Data for Training Language Model --- p.92 / Chapter 5.3.4 --- Training Tools --- p.95 / Chapter 5.3.5 --- Training Procedure --- p.95 / Chapter 5.3.6 --- Evaluation of the Language Models --- p.98 / Chapter 5.4 --- Language Boundary Detection --- p.99 / Chapter 5.4.1 --- Phone-based LBD --- p.100 / Chapter 5.4.2 --- Syllable-based LBD --- p.104 / Chapter 5.4.3 --- LBD Based on Syllable Lattice --- p.106 / Chapter 5.5 --- "Integration of the Acoustic Model Scores, Language Model Scores and Language Boundary Information" --- p.107 / Chapter 5.5.1 --- Integration of Acoustic Model Scores and Language Boundary Information. --- p.107 / Chapter 5.5.2 --- Integration of Modified Acoustic Model Scores and Language Model Scores --- p.109 / Chapter 5.5.3 --- Evaluation Criterion --- p.111 / Chapter 5.6 --- References --- p.112 / Chapter Chapter 6 --- Results and Analysis --- p.118 / Chapter 6.1 --- Speech Data for Development and Evaluation --- p.118 / Chapter 6.1.1 --- Development Data --- p.118 / Chapter 6.1.2 --- Testing Data --- p.118 / Chapter 6.2 --- Performance of Different Acoustic Units --- p.119 / Chapter 6.2.1 --- Analysis of Results --- p.120 / Chapter 6.3 --- Language Boundary Detection --- p.122 / Chapter 6.3.1 --- Phone-based Language Boundary Detection --- p.123 / Chapter 6.3.2 --- Syllable-based Language Boundary Detection (SYL LB) --- p.127 / Chapter 6.3.3 --- Language Boundary Detection Based on Syllable Lattice (BILINGUAL LBD) --- p.129 / Chapter 6.3.4 --- Observations --- p.129 / Chapter 6.4 --- Evaluation of the Language Models --- p.130 / Chapter 6.4.1 --- Character Perplexity --- p.130 / Chapter 6.4.2 --- Phonetic-to-text Conversion Rate --- p.131 / Chapter 6.4.3 --- Observations --- p.131 / Chapter 6.5 --- Character Error Rate --- p.132 / Chapter 6.5.1 --- Without Language Boundary Information --- p.133 / Chapter 6.5.2 --- With Language Boundary Detector SYL LBD --- p.134 / Chapter 6.5.3 --- With Language Boundary Detector BILINGUAL-LBD --- p.136 / Chapter 6.5.4 --- Observations --- p.138 / Chapter 6.6 --- References --- p.141 / Chapter Chapter 7 --- Conclusions and Suggestions for Future Work --- p.143 / Chapter 7.1 --- Conclusion --- p.143 / Chapter 7.1.1 --- Difficulties and Solutions --- p.144 / Chapter 7.2 --- Suggestions for Future Work --- p.149 / Chapter 7.2.1 --- Acoustic Modeling --- p.149 / Chapter 7.2.2 --- Pronunciation Modeling --- p.149 / Chapter 7.2.3 --- Language Modeling --- p.150 / Chapter 7.2.4 --- Speech Data --- p.150 / Chapter 7.2.5 --- Language Boundary Detection --- p.151 / Chapter 7.3 --- References --- p.151 / Appendix A Code-mixing Utterances in Training Set of CUMIX --- p.152 / Appendix B Code-mixing Utterances in Testing Set of CUMIX --- p.175 / Appendix C Usage of Speech Data in CUMIX --- p.202
|
437 |
Using duration information in HMM-based automatic speech recognition.January 2005 (has links)
Zhu Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 100-104). / Abstracts in English and Chinese. / Chapter CHAPTER 1 --- lNTRODUCTION --- p.1 / Chapter 1.1. --- Speech and its temporal structure --- p.1 / Chapter 1.2. --- Previous work on the modeling of temporal structure --- p.1 / Chapter 1.3. --- Integrating explicit duration modeling in HMM-based ASR system --- p.3 / Chapter 1.4. --- Thesis outline --- p.3 / Chapter CHAPTER 2 --- BACKGROUND --- p.5 / Chapter 2.1. --- Automatic speech recognition process --- p.5 / Chapter 2.2. --- HMM for ASR --- p.6 / Chapter 2.2.1. --- HMM for ASR --- p.6 / Chapter 2.2.2. --- HMM-based ASR system --- p.7 / Chapter 2.3. --- General approaches to explicit duration modeling --- p.12 / Chapter 2.3.1. --- Explicit duration modeling --- p.13 / Chapter 2.3.2. --- Training of duration model --- p.16 / Chapter 2.3.3. --- Incorporation of duration model in decoding --- p.18 / Chapter CHAPTER 3 --- CANTONESE CONNECTD-DlGlT RECOGNITION --- p.21 / Chapter 3.1. --- Cantonese connected digit recognition --- p.21 / Chapter 3.1.1. --- Phonetics of Cantonese and Cantonese digit --- p.21 / Chapter 3.2. --- The baseline system --- p.24 / Chapter 3.2.1. --- Speech corpus --- p.24 / Chapter 3.2.2. --- Feature extraction --- p.25 / Chapter 3.2.3. --- HMM models --- p.26 / Chapter 3.2.4. --- HMM decoding --- p.27 / Chapter 3.3. --- Baseline performance and error analysis --- p.27 / Chapter 3.3.1. --- Recognition performance --- p.27 / Chapter 3.3.2. --- Performance for different speaking rates --- p.28 / Chapter 3.3.3. --- Confusion matrix --- p.30 / Chapter CHAPTER 4 --- DURATION MODELING FOR CANTONESE DIGITS --- p.41 / Chapter 4.1. --- Duration features --- p.41 / Chapter 4.1.1. --- Absolute duration feature --- p.41 / Chapter 4.1.2. --- Relative duration feature --- p.44 / Chapter 4.2. --- Parametric distribution for duration modeling --- p.47 / Chapter 4.3. --- Estimation of the model parameters --- p.51 / Chapter 4.4. --- Speaking-rate-dependent duration model --- p.52 / Chapter CHAPTER 5 --- USING DURATION MODELING FOR CANTONSE DIGIT RECOGNITION --- p.57 / Chapter 5.1. --- Baseline decoder --- p.57 / Chapter 5.2. --- Incorporation of state-level duration model --- p.59 / Chapter 5.3. --- Incorporation word-level duration model --- p.62 / Chapter 5.4. --- Weighted use of duration model --- p.65 / Chapter CHAPTER 6 --- EXPERIMENT RESULT AND ANALYSIS --- p.66 / Chapter 6.1. --- Experiments with speaking-rate-independent duration models --- p.66 / Chapter 6.1.1. --- Discussion --- p.68 / Chapter 6.1.2. --- Analysis of the error patterns --- p.71 / Chapter 6.1.3. --- "Reduction of deletion, substitution and insertion" --- p.72 / Chapter 6.1.4. --- Recognition performance at different speaking rates --- p.75 / Chapter 6.2. --- Experiments with speaking-rate-dependent duration models --- p.77 / Chapter 6.2.1. --- Using true speaking rate --- p.77 / Chapter 6.2.2. --- Using estimated speaking rate --- p.79 / Chapter 6.3. --- Evaluation on another speech database --- p.80 / Chapter 6.3.1. --- Experimental setup --- p.80 / Chapter 6.3.2. --- Experiment results and analysis --- p.82 / Chapter CHAPTER 7 --- CONCLUSIONS AND FUTUR WORK --- p.87 / Chapter 7.1. --- Conclusion and understanding of current work --- p.87 / Chapter 7.2. --- Future work --- p.89 / Chapter A --- APPENDIX --- p.90 / BIBLIOGRAPHY --- p.100
|
438 |
An error detection and correction framework to improve large vocabulary continuous speech recognition. / CUHK electronic theses & dissertations collectionJanuary 2009 (has links)
In addition to the ED-EC framework, this thesis proposes a discriminative lattice rescoring (DLR) algorithm to facilitate the investigation of the extensibility of the framework. The DLR method recasts a discriminative n-gram model as a pseudo-conventional n-gram model and then uses this recast model to perform lattice rescoring. DLR improves the efficiency of discriminative n-gram modeling and facilitates combined processing of discriminative n-gram modeling with other post-processing techniques such as the ED-EC framework. / This thesis proposes an error detection and correction (ED-EC) framework to incorporate advanced linguistic knowledge sources into large vocabulary continuous speech recognition. Previous efforts that apply sophisticated language models (LMs) in speech recognition normally face a serious efficiency problem due to the intense computation required by these models. The ED-EC framework aims to achieve the full benefit of complex linguistic sources while at the same time maximize efficiency. The framework attempts to only apply computationally expensive LMs where needed in input speech. First, the framework detects recognition errors in the output of an efficient state-of-the-art decoding procedure. Then, it corrects the detected errors with the aid of sophisticated LMs by (1) creating alternatives for each detected error and (2) applying advanced models to distinguish among the alternatives. In this thesis, we implement a prototype of the ED-EC framework on the task of Mandarin dictation. This prototype detects recognition errors based on generalized word posterior probabilities, selects alternatives for errors from recognition lattices generated during decoding and adopts an advanced LM that combines mutual information, word trigrams and POS trigrams. The experimental results indicate the practical feasibility of the ED-EC framework, for which the optimal gain of the focused LM is theoretically achievable at low computational cost. On a general-domain test set, a 6.0% relative reduction in character error rate (CER) over the performance of a state-of-the-art baseline recognizer is obtained. In terms of efficiency, while both the detection of errors and the creation of alternatives are efficient, the application of the computationally expensive LM is concentrated on less than 50% of the utterances. We further demonstrate that the potential benefit of using the ED-EC framework in improving the recognition performance is tremendous. If error detection is perfect and alternatives for an error are guaranteed to include the correct one, the relative CER reduction over the baseline performance will increase to 36.0%. We also illustrate that the ED-EC framework is robust on unseen data and can be conveniently extended to other recognition systems. / Zhou, Zhengyu. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 72-11, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 142-155). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
439 |
A speech recognition IC with an efficient MFCC extraction algorithm and multi-mixture models. / CUHK electronic theses & dissertations collectionJanuary 2006 (has links)
Automatic speech recognition (ASR) by machine has received a great deal of attention in past decades. Speech recognition algorithms based on the Mel frequency cepstrum coefficient (MFCC) and the hidden Markov model (HMM) have a better recognition performance compared with other speech recognition algorithms and are widely used in many applications. In this thesis a speech recognition system with an efficient MFCC extraction algorithm and multi-mixture models is presented. It is composed of two parts: a MFCC feature extractor and a HMM-based speech decoder. / For the HMM-based decoder of the speech recognition system, it is advantageous to use models with multi mixtures, but with more mixtures the calculation becomes more complicated. Using a table look-up method proposed in this thesis the new design can handle up to 16 states and 8 mixtures. This new design can be easily extended to handle models which have more states and mixtures. We have implemented the new algorithm with an Altera FPGA chip using fix-point calculation and tested the FPGA chip with the speech data from the AURORA 2 database, which is a well known database designed to evaluate the performance of speech recognition algorithms in noisy conditions [27]. The recognition accuracy of the new system is 91.01%. A conventional software recognition system running on PC using 32-bit floating point calculation has a recognition accuracy of 94.65%. / In the conventional MFCC feature extraction algorithm, speech is separated into some short overlapped frames. The existing extraction algorithm requires a lot of computations and is not suitable for hardware implementation. We have developed a hardware efficient MFCC feature extraction algorithm in our work. The new algorithm reduces the computational power by 54% compared to the conventional algorithm with only 1.7% reduction in recognition accuracy. / Han Wei. / "September 2006." / Adviser: Cheong Fat Chan. / Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1823. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 108-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
440 |
efficient decoding method for continuous speech recognition based on a tree-structured lexicon =: 基於樹狀詞彙表示方法的有效率連續語音識別系統. / 基於樹狀詞彙表示方法的有效率連續語音識別系統 / An efficient decoding method for continuous speech recognition based on a tree-structured lexicon =: Ji yu shu zhuang ci hui biao shi fang fa de you xiao lü lian xu yu yin shi bie xi tong. / Ji yu shu zhuang ci hui biao shi fang fa de you xiao lü lian xu yu yin shi bie xi tongJanuary 2001 (has links)
Choi Wing Nin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Choi Wing Nin. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Development of search algorithms for Chinese LVCSR --- p.3 / Chapter 1.2 --- Objectives of the thesis --- p.4 / Chapter 1.3 --- Thesis outline --- p.5 / Reference --- p.7 / Chapter 2 --- Fundamentals of Continuous Speech Recognition --- p.9 / Chapter 2.1 --- The Bayesian decision rule --- p.9 / Chapter 2.2 --- Acoustic front-end processor --- p.11 / Chapter 2.3 --- Phonological constraint --- p.12 / Chapter 2.3.1 --- Characteristics of Cantonese --- p.12 / Chapter 2.3.2 --- Homophones and homographs --- p.13 / Chapter 2.4 --- Acoustic modeling --- p.13 / Chapter 2.5 --- Statistical language model --- p.15 / Chapter 2.5.1 --- Word-based language model --- p.15 / Chapter 2.5.2 --- Class-based language model --- p.16 / Chapter 2.6 --- Search algorithms --- p.17 / Chapter 2.6.1 --- Time-synchronous Viterbi search --- p.18 / Chapter 2.6.2 --- Time-asynchronous stack decoding --- p.18 / Chapter 2.6.3 --- One-pass versus multi-pass search strategies --- p.19 / Chapter 2.7 --- Summary --- p.20 / Reference --- p.21 / Chapter 3 --- Search Space Organization --- p.23 / Chapter 3.1 --- Lexicon representation --- p.24 / Chapter 3.1.1 --- Linear lexicon --- p.25 / Chapter 3.1.2 --- Tree lexicon --- p.27 / Chapter 3.2 --- Factorization of language model --- p.31 / Chapter 3.3 --- Lexical tree incorporated with context-dependent acoustic models --- p.36 / Chapter 3.4 --- Summary --- p.39 / Reference --- p.40 / Chapter 4 --- One-Pass Dynamic Programming Based Search Algorithm --- p.42 / Chapter 4.1 --- Token Passing Algorithm --- p.43 / Chapter 4.2 --- Techniques for speeding up the search --- p.48 / Chapter 4.2.1 --- Different layers of beam in the search hierarchy --- p.48 / Chapter 4.2.2 --- Efficient recombination of tokens --- p.51 / Chapter 4.2.3 --- Fast likelihood computation methods for continuous mixture densities --- p.52 / Chapter 4.2.4 --- Lexical tree with class-based language model --- p.54 / Chapter 4.3 --- Experimental results and discussions --- p.57 / Chapter 4.3.1 --- The Hong Kong stock inquiry task --- p.57 / Chapter 4.3.2 --- General domain continuous speech recognition --- p.59 / Reference --- p.62 / Chapter 5 --- Extension of the One-Pass Search --- p.64 / Chapter 5.1 --- Overview of the extended framework --- p.65 / Chapter 5.2 --- Word lattice construction by modified word-conditioned search --- p.66 / Chapter 5.2.1 --- Exact N-best algorithm --- p.66 / Chapter 5.2.2 --- Word-pair approximation --- p.67 / Chapter 5.2.3 --- Word lattice algorithm --- p.68 / Chapter 5.3 --- Computation of heuristic score --- p.70 / Chapter 5.4 --- Backward A* heuristic search --- p.72 / Chapter 5.4.1 --- Recovering the missing piece --- p.74 / Chapter 5.4.2 --- Generation of N-best list --- p.74 / Chapter 5.5 --- Experimental results --- p.75 / Chapter 5.5.1 --- Simple back-tracking vs A* heuristic search --- p.75 / Chapter 5.5.2 --- N-best list evaluation using class bigram re-scoring --- p.76 / Chapter 5.5.3 --- N-best list evaluation using class trigram re-scoring --- p.77 / Chapter 5.6 --- Summary --- p.78 / Reference --- p.79 / Chapter 6 --- Conclusions and Suggestions for Future Development --- p.80 / Chapter 6.1 --- Conclusions --- p.80 / Chapter 6.2 --- Suggestions for future development --- p.82 / Chapter 6.2.1 --- Incorporation of tone information --- p.82 / Chapter 6.2.2 --- Fast match strategy for acoustic models --- p.82 / Reference --- p.84 / Appendix Cantonese Initials and Finals --- p.85
|
Page generated in 0.0917 seconds