Spelling suggestions: "subject:"automatic speech recognition"" "subject:"2automatic speech recognition""
271 |
Development of an English public transport information dialogue system / Development of an English public transport information dialogue systemVejman, Martin January 2015 (has links)
This thesis presents a development of an English spoken dialogue system based on the Alex dialogue system framework. The work describes a component adaptation of the framework for a different domain and language. The system provides public transport information in New York. This work involves creating a statistical model and the deployment of custom Kaldi speech recognizer. Its performance was better in comparison with the Google Speech API. The comparison was based on a subjective user satisfaction acquired by crowdsourcing. Powered by TCPDF (www.tcpdf.org)
|
272 |
Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systemsGritzman, Ashley Daniel January 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in ful lment of the requirements for
the degree of Doctor of Philosophy.
Johannesburg, September 2016 / Having survived the ordeal of a laryngectomy, the patient must come to terms with
the resulting loss of speech. With recent advances in portable computing power,
automatic lip-reading (ALR) may become a viable approach to voice restoration. This
thesis addresses the image processing aspect of ALR, and focuses three contributions
to colour-based lip segmentation.
The rst contribution concerns the colour transform to enhance the contrast
between the lips and skin. This thesis presents the most comprehensive study to
date by measuring the overlap between lip and skin histograms for 33 di erent
colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%,
and results show that selecting the correct transform can increase the segmentation
accuracy by up to three times.
The second contribution is the development of a new lip segmentation algorithm
that utilises the best colour transforms from the comparative study. The algorithm
is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation
error (SE) of 7:39 %.
The third contribution focuses on the impact of the histogram threshold on the
segmentation accuracy, and introduces a novel technique called Adaptive Threshold
Optimisation (ATO) to select a better threshold value. The rst stage of ATO
incorporates -SVR to train the lip shape model. ATO then uses feedback of shape
information to validate and optimise the threshold. After applying ATO, the SE
decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp
or relative improvement of 15:1%. While this thesis concerns lip segmentation in
particular, ATO is a threshold selection technique that can be used in various
segmentation applications. / MT2017
|
273 |
Automated biometrics of audio-visual multiple modalsUnknown Date (has links)
Biometrics is the science and technology of measuring and analyzing biological data for authentication purposes. Its progress has brought in a large number of civilian and government applications. The candidate modalities used in biometrics include retinas, fingerprints, signatures, audio, faces, etc. There are two types of biometric system: single modal systems and multiple modal systems. Single modal systems perform person recognition based on a single biometric modality and are affected by problems like noisy sensor data, intra-class variations, distinctiveness and non-universality. Applying multiple modal systems that consolidate evidence from multiple biometric modalities can alleviate those problems of single modal ones. Integration of evidence obtained from multiple cues, also known as fusion, is a critical part in multiple modal systems, and it may be consolidated at several levels like feature fusion level, matching score fusion level and decision fusion level. Among biometric modalities, both audio and face modalities are easy to use and generally acceptable by users. Furthermore, the increasing availability and the low cost of audio and visual instruments make it feasible to apply such Audio-Visual (AV) systems for security applications. Therefore, this dissertation proposes an algorithm of face recognition. In addition, it has developed some novel algorithms of fusion in different levels for multiple modal biometrics, which have been tested by a virtual database and proved to be more reliable and robust than systems that rely on a single modality. / by Lin Huang. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web.
|
274 |
Development of a Cantonese-English code-mixing speech recognition system. / CUHK electronic theses & dissertations collectionJanuary 2011 (has links)
A data-driven computational approach is adopted to reveal significant pronunciation variations in Cantonese-English code-mixing speech. The findings are successfully applied to constructing a more relevant bilingual pronunciation dictionary and for selecting effective training materials for code-mixing ASR. For acoustic modeling, it is shown that cross-lingual acoustic models are more appropriate than language-dependent models. Various cross-lingual inventories are derived based on different combination schemes and similarity measurements. We have shown that the proposed data-driven approach based on K-L divergence and phonetic confusion matrix outperforms the IPA-based approach using merely phonetic knowledge. It is also found that initials and finals are more appropriate to be used as the basic Cantonese units than phonemes in code-mixing speech recognition applications. A text database with more than 9 million characters is compiled for language modeling of code-mixing ASR. Classbased language models with automatic clustering classes have been proven inefficient for code-mixing speech recognition. A semantics-based n-gram mapping approach is proposed to increase the counts of code-mixing n-gram at language boundaries. The language model perplexity and recognition performance has been significantly improved with the proposed semantics-based language models. The proposed code-mixing speech recognition system achieves 75.0% overall accuracy for Cantonese-English code-mixing speech, while the accuracy for Cantonese characters is 76.1% and accuracy for English lexicons is 65.5%. It also attains a reasonable character accuracy of 75.3% for monolingual Cantonese speech. / Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two languages in a spoken utterance. This thesis addresses the problem of the automatic recognition of Cantonese-English code-mixing speech, which is widely used in Hong Kong. / Cross-lingual speaker adaptation has also been investigated in the thesis. Speaker independent (SI) model mapping between Cantonese and English is established at different levels of acoustic units, viz phones, states, and Gaussian mixture components. A novel approach for cross-lingual speaker adaptation via Gaussian component mapping is proposed and has been proved to be effective in most speech recognition tasks. / This study starts with the investigation of the linguistic properties of Cantonese-English code-mixing, which is based on a large number of real code-mixing text corpora collected from the internet and other sources. The effects of language mixing for the automatic recognition of Cantonese-English codemixing utterances are analyzed in a systematic way. The problem of pronunciation dictionary, acoustic modeling and language modeling are investigated. Subsequently, a large-vocabulary code-mixing speech recognition system is developed and implemented. / While automatic speech recognition (ASR) of either Cantonese or English alone has achieved a great degree of success, recognition of Cantonese-English code-mixing speech is not as trivial. Unknown language boundary, accents in code-switched English words, phonetic and phonological differences between Cantonese and English, no regulated grammatical structure, and lack of speech and text data make the ASR of code-mixing utterances much more than a simple integration of two monolingual speech recognition systems. On the other hand, we have little understanding of this highly dynamic language phenomenon. Unlike in monolingual speech recognition research, there are very few linguistic studies that can be referred to. / Cao, Houwei. / Adviser: P.C. Ching. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 129-140). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
275 |
Large vocabulary Cantonese speech recognition using neural networks.January 1994 (has links)
Tsik Chung Wai Benjamin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 67-70). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Speech Recognition --- p.1 / Chapter 1.2 --- Cantonese Speech Recognition --- p.3 / Chapter 1.3 --- Neural Networks --- p.4 / Chapter 1.4 --- About this Thesis --- p.5 / Chapter 2 --- The Phonology of Cantonese --- p.6 / Chapter 2.1 --- The Syllabic Structure of Cantonese Syllable --- p.7 / Chapter 2.2 --- The Tone System of Cantonese --- p.9 / Chapter 3 --- Review of Automatic Speech Recognition Systems --- p.12 / Chapter 3.1 --- Hidden Markov Model Approach --- p.12 / Chapter 3.2 --- Neural Networks Approach --- p.13 / Chapter 3.2.1 --- Multi-Layer Perceptrons (MLP) --- p.13 / Chapter 3.2.2 --- Time-Delay Neural Networks (TDNN) --- p.15 / Chapter 3.2.3 --- Recurrent Neural Networks --- p.17 / Chapter 3.3 --- Integrated Approach --- p.18 / Chapter 3.4 --- Mandarin and Cantonese Speech Recognition Systems --- p.19 / Chapter 4 --- The Speech Corpus and Database --- p.21 / Chapter 4.1 --- Design of the Speech Corpus --- p.21 / Chapter 4.2 --- Speech Database Acquisition --- p.23 / Chapter 5 --- Feature Parameters Extraction --- p.24 / Chapter 5.1 --- Endpoint Detection --- p.25 / Chapter 5.2 --- Speech Processing --- p.26 / Chapter 5.3 --- Speech Segmentation --- p.27 / Chapter 5.4 --- Phoneme Feature Extraction --- p.29 / Chapter 5.5 --- Tone Feature Extraction --- p.30 / Chapter 6 --- The Design of the System --- p.33 / Chapter 6.1 --- Towards Large Vocabulary System --- p.34 / Chapter 6.2 --- Overview of the Isolated Cantonese Syllable Recognition System --- p.36 / Chapter 6.3 --- The Primary Level: Phoneme Classifiers and Tone Classifier --- p.38 / Chapter 6.4 --- The Intermediate Level: Ending Corrector --- p.42 / Chapter 6.5 --- The Secondary Level: Syllable Classifier --- p.43 / Chapter 6.5.1 --- Concatenation with Correction Approach --- p.44 / Chapter 6.5.2 --- Fuzzy ART Approach --- p.45 / Chapter 7 --- Computer Simulation --- p.49 / Chapter 7.1 --- Experimental Conditions --- p.49 / Chapter 7.2 --- Experimental Results of the Primary Level Classifiers --- p.50 / Chapter 7.3 --- Overall Performance of the System --- p.57 / Chapter 7.4 --- Discussions --- p.61 / Chapter 8 --- Further Works --- p.62 / Chapter 8.1 --- Enhancement on Speech Segmentation --- p.62 / Chapter 8.2 --- Towards Speaker-Independent System --- p.63 / Chapter 8.3 --- Towards Speech-to-Text System --- p.64 / Chapter 9 --- Conclusions --- p.65 / Bibliography --- p.67 / Appendix A. Cantonese Syllable Full Set List --- p.71
|
276 |
An Efficient tone classifier for speech recognition of Cantonese.January 1991 (has links)
by Cheng Yat Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1991. / Bibliography: leaves 106-108. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter Chapter 2 --- Preliminary Considerations --- p.8 / Chapter 2.1 --- Tone System of Cantonese --- p.8 / Chapter 2.2 --- Tone Classification Systems --- p.14 / Chapter 2.3 --- Design of a Speech Corpus --- p.17 / Chapter Chapter 3 --- Feature Parameters for Tone Classification --- p.22 / Chapter 3.1 --- Methodology --- p.22 / Chapter 3.2 --- Endpoint Detection and Time Alignment --- p.23 / Chapter 3.3 --- Pitch --- p.27 / Chapter 3.3.1 --- Pitch Profile Extraction --- p.28 / Chapter 3.3.2 --- Evaluation of Pitch Profile --- p.33 / Chapter 3.3.3 --- Feature Parameters Derived from Pitch Profile --- p.40 / Chapter 3.4 --- Duration --- p.46 / Chapter 3.5 --- Energy --- p.49 / Chapter 3.5.1 --- Energy Profile Extraction --- p.49 / Chapter 3.5.2 --- Evaluation of Energy Profile --- p.50 / Chapter 3.6 --- Summary --- p.54 / Chapter Chapter 4 --- Implementation of the Tone Classification System --- p.56 / Chapter 4.1 --- Intrinsic Pitch Estimation --- p.59 / Chapter 4.2 --- The Classifier --- p.63 / Chapter 4.2.1 --- Neural Network --- p.64 / Chapter 4.2.2 --- Post-Processing Unit --- p.74 / Chapter Chapter 5 --- Performance Evaluation on the Tone Classification System --- p.76 / Chapter 5.1 --- Single Speaker Tone Classification --- p.77 / Chapter 5.2 --- Multi-Speaker and Speaker Independent Tone Classification --- p.82 / Chapter 5.2.1 --- Classification with no Phonetic Information --- p.83 / Chapter 5.2.2 --- Classification with Known Final Consonants --- p.88 / Chapter 5.3 --- Confidence Improvement of the Recognition Results --- p.95 / Chapter 5.4 --- Summary --- p.101 / Chapter Chapter 6 --- Conclusions and Discussions --- p.102 / References --- p.106 / Chapter Appendix A --- Vocabulary of the Speech Corpus --- p.A1-A4 / Chapter Appendix B --- Statistics of the Pitch Profiles --- p.Bl-Bl5 / Chapter Appendix C --- Statistics of the Energy Profiles --- p.Cl-Cl1 / RESULTS
|
277 |
Automatic recognition of isolated Cantonese syllables using neural networks =: 利用神經網絡識別粤語單音節. / 利用神經網絡識別粤語單音節 / Automatic recognition of isolated Cantonese syllables using neural networks =: Li yong shen jing wang luo shi bie yue yu dan yin jie. / Li yong shen jing wang luo shi bie yue yu dan yin jieJanuary 1996 (has links)
by Tan Lee. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references. / by Tan Lee. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Conventional Pattern Recognition Approaches for Speech Recognition --- p.3 / Chapter 1.2 --- A Review on Neural Network Applications in Speech Recognition --- p.6 / Chapter 1.2.1 --- Static Pattern Classification --- p.7 / Chapter 1.2.2 --- Hybrid Approaches --- p.9 / Chapter 1.2.3 --- Dynamic Neural Networks --- p.12 / Chapter 1.3 --- Automatic Recognition of Cantonese Speech --- p.16 / Chapter 1.4 --- Organization of the Thesis --- p.18 / References --- p.20 / Chapter 2 --- Phonological and Acoustical Properties of Cantonese Syllables --- p.29 / Chapter 2.1 --- Phonology of Cantonese --- p.29 / Chapter 2.1.1 --- Basic Phonetic Units --- p.30 / Chapter 2.1.2 --- Syllabic Structure --- p.32 / Chapter 2.1.3 --- Lexical Tones --- p.33 / Chapter 2.2 --- Acoustical Properties of Cantonese Syllables --- p.35 / Chapter 2.2.1 --- Spectral Features --- p.35 / Chapter 2.2.2 --- Energy and Zero-Crossing Rate --- p.39 / Chapter 2.2.3 --- Pitch --- p.40 / Chapter 2.2.4 --- Duration --- p.41 / Chapter 2.3 --- Acoustic Feature Extraction for Speech Recognition of Cantonese --- p.42 / References --- p.46 / Chapter 3 --- Tone Recognition of Isolated Cantonese Syllables --- p.48 / Chapter 3.1 --- Acoustic Pre-processing --- p.48 / Chapter 3.1.1 --- Voiced Portion Detection --- p.48 / Chapter 3.1.2 --- Pitch Extraction --- p.51 / Chapter 3.2 --- Supra-Segmental Feature Parameters for Tone Recognition --- p.53 / Chapter 3.2.1 --- Pitch-Related Feature Parameters --- p.53 / Chapter 3.2.2 --- Duration and Energy Drop Rate --- p.55 / Chapter 3.2.3 --- Normalization of Feature Parameters --- p.57 / Chapter 3.3 --- An MLP Based Tone Classifier --- p.58 / Chapter 3.4 --- Simulation Experiments --- p.59 / Chapter 3.4.1 --- Speech Data --- p.59 / Chapter 3.4.2 --- Feature Extraction and Normalization --- p.61 / Chapter 3.4.3 --- Experimental Results --- p.61 / Chapter 3.5 --- Discussion and Conclusion --- p.64 / References --- p.65 / Chapter 4 --- Recurrent Neural Network Based Dynamic Speech Models --- p.67 / Chapter 4.1 --- Motivations and Rationales --- p.68 / Chapter 4.2 --- RNN Speech Model (RSM) --- p.71 / Chapter 4.2.1 --- Network Architecture and Dynamic Operation --- p.71 / Chapter 4.2.2 --- RNN for Speech Modeling --- p.72 / Chapter 4.2.3 --- Illustrative Examples --- p.75 / Chapter 4.3 --- Training of RNN Speech Models --- p.78 / Chapter 4.3.1 --- Real-Time-Recurrent-Learning (RTRL) Algorithm --- p.78 / Chapter 4.3.2 --- Iterative Re-segmentation Training of RSM --- p.80 / Chapter 4.4 --- Several Practical Issues in RSM Training --- p.85 / Chapter 4.4.1 --- Combining Adjacent Segments --- p.85 / Chapter 4.4.2 --- Hypothesizing Initial Segmentation --- p.86 / Chapter 4.4.3 --- Improving Temporal State Dependency --- p.89 / Chapter 4.5 --- Simulation Experiments --- p.90 / Chapter 4.5.1 --- Experiment 4.1 - Training with a Single Utterance --- p.91 / Chapter 4.5.2 --- Experiment 4.2 - Effect of Augmenting Recurrent Learning Rate --- p.93 / Chapter 4.5.3 --- Experiment 4.3 - Training with Multiple Utterances --- p.96 / Chapter 4.5.4 --- Experiment 4.4 一 Modeling Performance of RSMs --- p.99 / Chapter 4.6 --- Conclusion --- p.104 / References --- p.106 / Chapter 5 --- Isolated Word Recognition Using RNN Speech Models --- p.107 / Chapter 5.1 --- A Baseline System --- p.107 / Chapter 5.1.1 --- System Description --- p.107 / Chapter 5.1.2 --- Simulation Experiments --- p.110 / Chapter 5.1.3 --- Discussion --- p.117 / Chapter 5.2 --- Incorporating Duration Information --- p.118 / Chapter 5.2.1 --- Duration Screening --- p.118 / Chapter 5.2.2 --- Determination of Duration Bounds --- p.120 / Chapter 5.2.3 --- Simulation Experiments --- p.120 / Chapter 5.2.4 --- Discussion --- p.124 / Chapter 5.3 --- Discriminative Training --- p.125 / Chapter 5.3.1 --- The Minimum Classification Error Formulation --- p.126 / Chapter 5.3.2 --- Generalized Probabilistic Descent Algorithm --- p.127 / Chapter 5.3.3 --- Determination of Training Parameters --- p.128 / Chapter 5.3.4 --- Simulation Experiments --- p.129 / Chapter 5.3.5 --- Discussion --- p.133 / Chapter 5.4 --- Conclusion --- p.134 / References --- p.135 / Chapter 6 --- An Integrated Speech Recognition System for Cantonese Syllables --- p.137 / Chapter 6.1 --- System Architecture and Recognition Scheme --- p.137 / Chapter 6.2 --- Speech Corpus and Data Pre-processing --- p.140 / Chapter 6.3 --- Recognition Experiments and Results --- p.140 / Chapter 6.4 --- Discussion and Conclusion --- p.144 / References --- p.146 / Chapter 7 --- Conclusions and Suggestions for Future Work --- p.147 / Chapter 7.1 --- Conclusions --- p.147 / Chapter 7.2 --- Suggestions for Future Work --- p.151
|
278 |
Attributes and extraction of tone information for continuous Cantonese speech recognition. / 連續粤語語音辨識裏的音調提取和音調特性 / Attributes and extraction of tone information for continuous Cantonese speech recognition. / Lian xu yue yu yu yin bian shi li de yin diao ti qu he yin diao te xingJanuary 2000 (has links)
Lau Wai = 連續粤語語音辨識裏的音調提取和音調特性 / 劉偉. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Lau Wai = Lian xu yue yu yu yin bian shi li de yin diao ti qu he yin diao te xing / Liu Wei. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Speech Recognition of Chinese --- p.3 / Chapter 1.2 --- Tone Recognition --- p.3 / Chapter 1.3 --- Use of Tone Information in Speech Recognition --- p.4 / Chapter 1.4 --- Thesis Objectives --- p.6 / Chapter 1.5 --- Organization of the Thesis --- p.6 / Reference --- p.8 / Chapter 2 --- Properties of Tones in Cantonese --- p.12 / Chapter 2.1 --- The Cantonese Dialect --- p.12 / Chapter 2.1.1 --- "INITIAL, FINAL & TONE" --- p.13 / Chapter 2.1.2 --- Phonological Constraints --- p.16 / Chapter 2.2 --- Tones in Cantonese --- p.17 / Chapter 2.2.1 --- Linguistic Significance --- p.17 / Chapter 2.2.2 --- Acoustic properties --- p.18 / Chapter 2.2.3 --- Discriminative Features of the Cantonese Tones --- p.20 / Chapter 2.3 --- Summary --- p.21 / Reference --- p.22 / Chapter 3 --- Extraction of Tone Features --- p.23 / Chapter 3.1 --- Feature Parameters for Tone Recognition --- p.23 / Chapter 3.1.1 --- F0 Features --- p.23 / Chapter 3.1.2 --- Energy Features --- p.24 / Chapter 3.1.3 --- Log Scale vs. Linear Scale --- p.25 / Chapter 3.2 --- Detection of Voiced Speech --- p.26 / Chapter 3.3 --- Robust Algorithm for Pitch Tracking --- p.27 / Chapter 3.3.1 --- Generation of Period Candidates --- p.27 / Chapter 3.3.2 --- Post-processing --- p.28 / Chapter 3.4 --- Normalization of Fundamental Frequency --- p.29 / Chapter 3.4.1 --- Derivation of the normalization factor --- p.31 / Chapter 3.4.2 --- Moving-Window Normalization --- p.32 / Chapter 3.4.3 --- Energy Normalization --- p.35 / Chapter 3.5 --- FO Smoothing --- p.36 / Chapter 3.6 --- Generation of Tone Feature Vectors --- p.37 / Chapter 3.7 --- Summary --- p.39 / Reference --- p.40 / Chapter 4 --- Tone Recognition using Hidden Markov Models --- p.43 / Chapter 4.1 --- Two Methods of Tone Modeling --- p.43 / Chapter 4.2 --- Hidden Markov Models for Speech Recognition --- p.44 / Chapter 4.3 --- Tone Modeling by HMM --- p.47 / Chapter 4.4 --- Context-Dependent Tone Models --- p.48 / Chapter 4.5 --- Baseline Experiments --- p.49 / Chapter 4.5.1 --- The Speech Database - CUSENT´ёØ --- p.49 / Chapter 4.5.2 --- Data Pre-Processing --- p.50 / Chapter 4.5.3 --- Performance of Context-Independent Models --- p.51 / Chapter 4.5.4 --- Context-Dependent Tone Modeling --- p.52 / Chapter 4.6 --- Experiments on Moving-window FO Normalization --- p.54 / Chapter 4.6.1 --- Symmetric window --- p.54 / Chapter 4.6.2 --- Asymmetric window --- p.55 / Chapter 4.6.3 --- Energy normalization --- p.58 / Chapter 4.7 --- Incorporation of Statistical Tone Information --- p.58 / Chapter 4.8 --- Discussions --- p.59 / Chapter 4.9 --- Summary --- p.60 / Reference --- p.61 / Chapter 5 --- Integration of Tone Informaton into LVCSR for Cantonese --- p.63 / Chapter 5.1 --- The Goal --- p.63 / Chapter 5.2 --- N-best Based Integration --- p.64 / Chapter 5.2.1 --- Base Syllable Recognition --- p.65 / Chapter 5.2.2 --- Tone Recognition --- p.66 / Chapter 5.2.3 --- Language Models --- p.66 / Chapter 5.2.4 --- Integration and N-best Re-scoring --- p.66 / Chapter 5.2.5 --- Experimental Results --- p.67 / Chapter 5.2.6 --- Integration with Perfect Tone Information --- p.68 / Chapter 5.3 --- Broad Tone Classes --- p.68 / Chapter 5.3.1 --- Experimental Results --- p.70 / Chapter 5.3.2 --- Error analyses and Discussions --- p.71 / Chapter 5.4 --- Lattice Based Integration --- p.73 / Chapter 5.4.1 --- Lattice Expansion --- p.74 / Chapter 5.4.2 --- Experiments on Lattice Based Integration --- p.76 / Chapter 5.5 --- Discussions --- p.78 / Chapter 5.6 --- Summary --- p.79 / Reference --- p.80 / Chapter 6 --- Conclusions and Future Work --- p.81 / Chapter 6.1 --- Conclusions --- p.81 / Chapter 6.2 --- Suggestions for Future Work --- p.84 / Reference --- p.85
|
279 |
Acoustic units for Mandarin Chinese speech recognition =: 漢語語音識別中聲學單元的選擇. / 漢語語音識別中聲學單元的選擇 / Acoustic units for Mandarin Chinese speech recognition =: Han yu yu yin shi bie zhong sheng xue dan yuan de xuan ze. / Han yu yu yin shi bie zhong sheng xue dan yuan de xuan zeJanuary 1999 (has links)
by Choy Chi Yan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 111-115). / Text in English; abstract also in Chinese. / by Choy Chi Yan. / ABSTRACT --- p.I / ACKNOWLEDGMENTS --- p.III / TABLE OF CONTENTS --- p.IV / LIST OF FIGURES --- p.VII / LIST OF TABLES --- p.VIII / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Speech Recognition --- p.1 / Chapter 1.2 --- Development of Speech Recognisers --- p.4 / Chapter 1.3 --- Speech Recognition for Chinese Language --- p.5 / Chapter 1.4 --- Objectives of the thesis --- p.6 / Chapter 1.5 --- Thesis Structure --- p.7 / Chapter 2. --- PHONOLOGICAL AND ACOUSTICAL PROPERTIES OF MANDARIN CHINESE --- p.9 / Chapter 2.1 --- Characteristics of Mandarin Chinese --- p.9 / Chapter 2.1.1 --- Syllabic Structures --- p.10 / Chapter 2.1.2 --- Lexical Tones --- p.11 / Chapter 2.2 --- Basic Phonetic Units for Mandarin Chinese --- p.14 / Chapter 2.2.1 --- Tonal Syllables and Base Syllables --- p.14 / Chapter 2.2.2 --- Initial-Finals --- p.14 / Chapter 2.2.3 --- Phones --- p.16 / Chapter 2.2.4 --- Preme-Core-Finals and Preme-Tonemes --- p.17 / Chapter 2.2.5 --- Summary-The phonological hierarchy of Mandarin Syllables --- p.19 / Chapter 3. --- HIDDEN MARKOV MODELS --- p.20 / Chapter 3.1 --- Introduction --- p.20 / Chapter 3.1.1 --- Speech Data --- p.20 / Chapter 3.1.2 --- Fundamental of HMMs --- p.21 / Chapter 3.2 --- Using Hidden Markov Models for Speech Recognition --- p.22 / Chapter 3.2.1 --- Likelihood of the state sequence of speech observations --- p.22 / Chapter 3.2.2 --- The Recognition Problem --- p.24 / Chapter 3.3 --- Output Probability Distributions --- p.25 / Chapter 3.4 --- Model Training --- p.26 / Chapter 3.4.1 --- State Sequence Estimation --- p.26 / Chapter 3.4.2 --- Gaussian Mixture Models --- p.29 / Chapter 3.4.3 --- Parameter Estimation --- p.30 / Chapter 3.5 --- Speech Recognition and Viterbi Decoding --- p.31 / Chapter 3.6 --- Summary --- p.32 / Chapter 4. --- LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION FOR MANDARIN CHINESE --- p.33 / Chapter 4.1 --- Introduction --- p.33 / Chapter 4.2 --- Large Vocabulary Mandarin Chinese Recognition System --- p.34 / Chapter 4.2.1 --- Overall Architecture for the Speech Recogniser --- p.34 / Chapter 4.2.2 --- Signal Representation and Features --- p.36 / Chapter 4.2.3 --- Subword Unit Models Based on HMMs --- p.39 / Chapter 4.2.4 --- Training of Subword Units --- p.42 / Chapter 4.2.5 --- Language Model (LM) --- p.43 / Chapter 4.2.6 --- "Transcriptions, Word Networks and Dictionaries for LVCSR System" --- p.44 / Chapter 4.2.7 --- Viterbi Decoding --- p.47 / Chapter 4.2.8 --- Performance Analysis --- p.48 / Chapter 4.3 --- Experiments --- p.48 / Chapter 4.3.1 --- Tasks --- p.48 / Chapter 4.3.2 --- Speech Database --- p.49 / Chapter 4.3.3 --- Baseline Experimental Results --- p.51 / Chapter 4.4 --- Context Dependency in Speech --- p.52 / Chapter 4.4.1 --- Introduction --- p.52 / Chapter 4.4.2 --- Context Dependent Phonetic Models --- p.53 / Chapter 4.4.3 --- Word Boundaries and Word network for context-dependent HMMs --- p.54 / Chapter 4.4.4 --- Recognition Results Using Cross-Syllable Context-Dependent Units --- p.56 / Chapter 4.5 --- Tree-Based Clustering --- p.58 / Chapter 4.5.1 --- Introduction --- p.58 / Chapter 4.5.2 --- Decision Tree Based Clustering --- p.59 / Chapter 4.5.3 --- The Question Sets --- p.61 / Chapter 4.5.4 --- Convergence Condition --- p.61 / Chapter 4.4.5 --- The Final Results --- p.63 / Chapter 4.6 --- Conclusions --- p.65 / Chapter 5. --- APPLICATION1 ISOLATED WORD RECOGNITION FOR MANDARIN CHINESE --- p.67 / Chapter 5.1 --- Introduction --- p.67 / Chapter 5.2 --- Isolated Word Recogniser --- p.68 / Chapter 5.2.1 --- System Description --- p.68 / Chapter 5.2.2 --- Experimental Results --- p.70 / Chapter 5.3 --- Discussions and Conclusions --- p.71 / Chapter 6. --- APPLICATION2 SUBWORD UNITS FOR A MANDARIN KEYWORD SPOTTING SYSTEM --- p.74 / Chapter 6.1 --- INTRODUCTION --- p.74 / Chapter 6.2 --- RECOGNITION SYSTEM DESCRIPTION --- p.75 / Chapter 6.2.1 --- Overall Architecture and Recognition Network for the keyword Spotters --- p.75 / Chapter 6.2.2 --- Signal Representation and Features --- p.76 / Chapter 6.2.3 --- Keyword Models --- p.76 / Chapter 6.2.4 --- Filler Models --- p.77 / Chapter 6.2.5 --- Language Modeling and Search --- p.78 / Chapter 6.3 --- EXPERIMENTS --- p.78 / Chapter 6.3.1 --- Tasks --- p.78 / Chapter 6.3.2 --- Speech Database --- p.79 / Chapter 6.3.3 --- Performance Measures --- p.80 / Chapter 6.3.4 --- Details of Different Word-spotters --- p.80 / Chapter 6.3.5 --- General Filler Models --- p.81 / Chapter 6.4 --- EXPERIMENTAL RESULTS --- p.83 / Chapter 6.5 --- CONCLUSIONS --- p.84 / Chapter 7. --- CONCLUSIONS --- p.87 / Chapter 7.1 --- Review of the Work --- p.87 / Chapter 7.1.1 --- Large Vocabulary Continuous Speech Recognition for Mandarin Chinese --- p.87 / Chapter 7.1.2 --- Isolated Word Recognition for a Stock Inquiry Application --- p.88 / Chapter 7.1.3 --- Keyword Spotting for Mandarin Chinese --- p.89 / Chapter 7.2 --- Suggestions for Further Work --- p.89 / Chapter 7.3 --- Conclusion --- p.91 / APPENDIX --- p.92 / BIBLIOGRAPHY --- p.111
|
280 |
Audio search of surveillance data using keyword spotting and dynamic models =: 利用關鍵詞及動態模型進行的語音情報搜尋. / 利用關鍵詞及動態模型進行的語音情報搜尋 / Audio search of surveillance data using keyword spotting and dynamic models =: Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xun. / Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xunJanuary 2001 (has links)
Lam Hiu Sing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Lam Hiu Sing. / Acknowledgement --- p.5 / Chapter I. --- Table of Content --- p.6 / Chapter II. --- Lists of Tables --- p.9 / Chapter III. --- Lists of Figures --- p.11 / Chapter Chapter 1 --- Introduction --- p.13 / Chapter 1.1 --- Intelligence gathering by surveillance --- p.13 / Chapter 1.2 --- Speech recognition and keyword spotting --- p.16 / Chapter 1.3 --- Audio indexing and searching --- p.18 / Chapter 1.3.1 --- Nature of audio sources --- p.18 / Chapter 1.3.2 --- Different searching objectives --- p.19 / Chapter 1.4 --- Objective of thesis --- p.22 / Chapter 1.5 --- Thesis outline --- p.23 / Chapter 1.6 --- References --- p.24 / Chapter Chapter 2 --- HMM-based Keyword Spotting System --- p.28 / Chapter 2.1 --- Statistical speech model --- p.28 / Chapter 2.1.1 --- Speech signal representations --- p.29 / Chapter 2.1.2 --- Acoustic modeling --- p.29 / Chapter 2.1.3 --- HMM message generation model --- p.32 / Chapter 2.2 --- Basics of keyword spotting --- p.34 / Chapter 2.2.1 --- Keyword and non-keyword modeling --- p.34 / Chapter 2.2.2 --- Language model --- p.37 / Chapter 2.2.3 --- Performance measure --- p.38 / Chapter 2.3 --- Keyword spotting applications --- p.39 / Chapter 2.3.1 --- Information query system --- p.40 / Chapter 2.3.2 --- Topic identification system --- p.41 / Chapter 2.3.3 --- Audio indexing and searching system --- p.42 / Chapter 2.3.4 --- Lexicon learning system --- p.42 / Chapter 2.4 --- Summary --- p.43 / Chapter 2.5 --- References --- p.44 / Chapter Chapter 3 --- Cantonese Characteristics --- p.49 / Chapter 3.1 --- Cantonese Dialect --- p.49 / Chapter 3.2 --- Phonological properties of Cantonese --- p.51 / Chapter 3.2.1 --- Initials and finals of Cantonese --- p.51 / Chapter 3.2.2 --- Tones of Cantonese --- p.54 / Chapter 3.3 --- Summary --- p.55 / Chapter 3.4 --- References --- p.55 / Chapter Chapter 4 --- System Configuration for Audio Search of Surveillance Data --- p.57 / Chapter 4.1 --- Audio Search of Surveillance Data --- p.57 / Chapter 4.2 --- Requirements and Specifications of the Proposed Audio Search System --- p.59 / Chapter 4.3 --- Proposed Audio Search System Architecture --- p.62 / Chapter 4.4 --- Summary --- p.65 / Chapter 4.5 --- References --- p.66 / Chapter Chapter 5 --- Development of a Keyword Spotting based Audio Indexing and Searching System --- p.67 / Chapter 5.1 --- Acoustic Models for Keywords and Fillers --- p.67 / Chapter 5.2 --- Adaptation mechanism --- p.76 / Chapter 5.2.1 --- Adaptation techniques --- p.76 / Chapter 5.2.2 --- Adaptation strategy for MLLR --- p.85 / Chapter 5.3 --- Language model --- p.86 / Chapter 5.4 --- Summary --- p.88 / Chapter 5.5 --- References --- p.88 / Chapter Chapter 6 --- System Evaluations --- p.93 / Chapter 6.1 --- Data for training and evaluation of the system --- p.94 / Chapter 6.1.1 --- Training Data --- p.94 / Chapter 6.1.2 --- Evaluation data --- p.95 / Chapter 6.1.3 --- Performance measure --- p.97 / Chapter 6.2 --- Cluster settings for MLLR adaptation --- p.98 / Chapter 6.3 --- Effects of word insertion penalty --- p.102 / Chapter 6.4 --- Acoustic modeling performance comparisons --- p.103 / Chapter 6.4.1 --- System robustness test --- p.104 / Chapter 6.4.2 --- The performance limit --- p.105 / Chapter 6.5 --- Overall System Performance --- p.107 / Chapter 6.6 --- Summary --- p.108 / Chapter 6.7 --- References --- p.108 / Chapter Chapter 7 --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Future works --- p.110 / Chapter 7.2.1 --- Discriminative adaptation --- p.110 / Chapter 7.2.2 --- Pronunciation dictionary --- p.111 / Chapter 7.2.3 --- Channel effect --- p.111
|
Page generated in 0.1313 seconds