• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 510
  • 40
  • 37
  • 35
  • 27
  • 25
  • 21
  • 21
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • Tagged with
  • 924
  • 924
  • 509
  • 216
  • 165
  • 150
  • 148
  • 100
  • 98
  • 86
  • 78
  • 72
  • 71
  • 71
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
421

Exploitation du contexte sémantique pour améliorer la reconnaissance des noms propres dans les documents audio diachroniques / Exploiting Semantic and Topic Context to Improve Recognition of Proper Names in Diachronic Audio Documents

Sheikh, Imran 24 November 2016 (has links)
La nature diachronique des bulletins d'information provoque de fortes variations du contenu linguistique et du vocabulaire dans ces documents. Dans le cadre de la reconnaissance automatique de la parole, cela conduit au problème de mots hors vocabulaire (Out-Of-Vocabulary, OOV). La plupart des mots OOV sont des noms propres. Les noms propres sont très importants pour l'indexation automatique de contenus audio-vidéo. De plus, leur bonne identification est importante pour des transcriptions automatiques fiables. Le but de cette thèse est de proposer des méthodes pour récupérer les noms propres manquants dans un système de reconnaissance. Nous proposons de modéliser le contexte sémantique et d'utiliser des informations thématiques contenus dans les documents audio à transcrire. Des modèles probabilistes de thème et des projections dans un espace continu obtenues à l'aide de réseaux de neurones sont explorés pour la tâche de récupération des noms propres pertinents. Une évaluation approfondie de ces représentations contextuelles a été réalisée. Pour modéliser le contexte de nouveaux mots plus efficacement, nous proposons des réseaux de neurones qui maximisent la récupération des noms propres pertinents. En s'appuyant sur ce modèle, nous proposons un nouveau modèle (Neural Bag-of-Weighted-Words, NBOW2) qui permet d'estimer un degré d'importance pour chacun des mots du document et a la capacité de capturer des mots spécifiques à ce document. Des expériences de reconnaissance automatique de bulletins d'information télévisés montrent l'efficacité du modèle proposé. L'évaluation de NBOW2 sur d'autres tâches telles que la classification de textes montre des bonnes performances / The diachronic nature of broadcast news causes frequent variations in the linguistic content and vocabulary, leading to the problem of Out-Of-Vocabulary (OOV) words in automatic speech recognition. Most of the OOV words are found to be proper names whereas proper names are important for automatic indexing of audio-video content as well as for obtaining reliable automatic transcriptions. The goal of this thesis is to model the semantic and topical context of new proper names in order to retrieve those which are relevant to the spoken content in the audio document. Training context models is a challenging problem in this task because several new names come with a low amount of data and the context model should be robust to errors in the automatic transcription. Probabilistic topic models and word embeddings from neural network models are explored for the task of retrieval of relevant proper names. A thorough evaluation of these contextual representations is performed. It is argued that these representations, which are learned in an unsupervised manner, are not the best for the given retrieval task. Neural network context models trained with an objective to maximise the retrieval performance are proposed. The proposed Neural Bag-of-Weighted-Words (NBOW2) model learns to assign a degree of importance to input words and has the ability to capture task specific key-words. Experiments on automatic speech recognition on French broadcast news videos demonstrate the effectiveness of the proposed models. Evaluation of the NBOW2 model on standard text classification tasks shows that it learns interesting information and gives best classification accuracies among the BOW models
422

Development of a Cantonese-English code-mixing speech recognition system. / CUHK electronic theses & dissertations collection

January 2011 (has links)
A data-driven computational approach is adopted to reveal significant pronunciation variations in Cantonese-English code-mixing speech. The findings are successfully applied to constructing a more relevant bilingual pronunciation dictionary and for selecting effective training materials for code-mixing ASR. For acoustic modeling, it is shown that cross-lingual acoustic models are more appropriate than language-dependent models. Various cross-lingual inventories are derived based on different combination schemes and similarity measurements. We have shown that the proposed data-driven approach based on K-L divergence and phonetic confusion matrix outperforms the IPA-based approach using merely phonetic knowledge. It is also found that initials and finals are more appropriate to be used as the basic Cantonese units than phonemes in code-mixing speech recognition applications. A text database with more than 9 million characters is compiled for language modeling of code-mixing ASR. Classbased language models with automatic clustering classes have been proven inefficient for code-mixing speech recognition. A semantics-based n-gram mapping approach is proposed to increase the counts of code-mixing n-gram at language boundaries. The language model perplexity and recognition performance has been significantly improved with the proposed semantics-based language models. The proposed code-mixing speech recognition system achieves 75.0% overall accuracy for Cantonese-English code-mixing speech, while the accuracy for Cantonese characters is 76.1% and accuracy for English lexicons is 65.5%. It also attains a reasonable character accuracy of 75.3% for monolingual Cantonese speech. / Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two languages in a spoken utterance. This thesis addresses the problem of the automatic recognition of Cantonese-English code-mixing speech, which is widely used in Hong Kong. / Cross-lingual speaker adaptation has also been investigated in the thesis. Speaker independent (SI) model mapping between Cantonese and English is established at different levels of acoustic units, viz phones, states, and Gaussian mixture components. A novel approach for cross-lingual speaker adaptation via Gaussian component mapping is proposed and has been proved to be effective in most speech recognition tasks. / This study starts with the investigation of the linguistic properties of Cantonese-English code-mixing, which is based on a large number of real code-mixing text corpora collected from the internet and other sources. The effects of language mixing for the automatic recognition of Cantonese-English codemixing utterances are analyzed in a systematic way. The problem of pronunciation dictionary, acoustic modeling and language modeling are investigated. Subsequently, a large-vocabulary code-mixing speech recognition system is developed and implemented. / While automatic speech recognition (ASR) of either Cantonese or English alone has achieved a great degree of success, recognition of Cantonese-English code-mixing speech is not as trivial. Unknown language boundary, accents in code-switched English words, phonetic and phonological differences between Cantonese and English, no regulated grammatical structure, and lack of speech and text data make the ASR of code-mixing utterances much more than a simple integration of two monolingual speech recognition systems. On the other hand, we have little understanding of this highly dynamic language phenomenon. Unlike in monolingual speech recognition research, there are very few linguistic studies that can be referred to. / Cao, Houwei. / Adviser: P.C. Ching. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 129-140). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
423

Large vocabulary Cantonese speech recognition using neural networks.

January 1994 (has links)
Tsik Chung Wai Benjamin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 67-70). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Speech Recognition --- p.1 / Chapter 1.2 --- Cantonese Speech Recognition --- p.3 / Chapter 1.3 --- Neural Networks --- p.4 / Chapter 1.4 --- About this Thesis --- p.5 / Chapter 2 --- The Phonology of Cantonese --- p.6 / Chapter 2.1 --- The Syllabic Structure of Cantonese Syllable --- p.7 / Chapter 2.2 --- The Tone System of Cantonese --- p.9 / Chapter 3 --- Review of Automatic Speech Recognition Systems --- p.12 / Chapter 3.1 --- Hidden Markov Model Approach --- p.12 / Chapter 3.2 --- Neural Networks Approach --- p.13 / Chapter 3.2.1 --- Multi-Layer Perceptrons (MLP) --- p.13 / Chapter 3.2.2 --- Time-Delay Neural Networks (TDNN) --- p.15 / Chapter 3.2.3 --- Recurrent Neural Networks --- p.17 / Chapter 3.3 --- Integrated Approach --- p.18 / Chapter 3.4 --- Mandarin and Cantonese Speech Recognition Systems --- p.19 / Chapter 4 --- The Speech Corpus and Database --- p.21 / Chapter 4.1 --- Design of the Speech Corpus --- p.21 / Chapter 4.2 --- Speech Database Acquisition --- p.23 / Chapter 5 --- Feature Parameters Extraction --- p.24 / Chapter 5.1 --- Endpoint Detection --- p.25 / Chapter 5.2 --- Speech Processing --- p.26 / Chapter 5.3 --- Speech Segmentation --- p.27 / Chapter 5.4 --- Phoneme Feature Extraction --- p.29 / Chapter 5.5 --- Tone Feature Extraction --- p.30 / Chapter 6 --- The Design of the System --- p.33 / Chapter 6.1 --- Towards Large Vocabulary System --- p.34 / Chapter 6.2 --- Overview of the Isolated Cantonese Syllable Recognition System --- p.36 / Chapter 6.3 --- The Primary Level: Phoneme Classifiers and Tone Classifier --- p.38 / Chapter 6.4 --- The Intermediate Level: Ending Corrector --- p.42 / Chapter 6.5 --- The Secondary Level: Syllable Classifier --- p.43 / Chapter 6.5.1 --- Concatenation with Correction Approach --- p.44 / Chapter 6.5.2 --- Fuzzy ART Approach --- p.45 / Chapter 7 --- Computer Simulation --- p.49 / Chapter 7.1 --- Experimental Conditions --- p.49 / Chapter 7.2 --- Experimental Results of the Primary Level Classifiers --- p.50 / Chapter 7.3 --- Overall Performance of the System --- p.57 / Chapter 7.4 --- Discussions --- p.61 / Chapter 8 --- Further Works --- p.62 / Chapter 8.1 --- Enhancement on Speech Segmentation --- p.62 / Chapter 8.2 --- Towards Speaker-Independent System --- p.63 / Chapter 8.3 --- Towards Speech-to-Text System --- p.64 / Chapter 9 --- Conclusions --- p.65 / Bibliography --- p.67 / Appendix A. Cantonese Syllable Full Set List --- p.71
424

An Efficient tone classifier for speech recognition of Cantonese.

January 1991 (has links)
by Cheng Yat Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1991. / Bibliography: leaves 106-108. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter Chapter 2 --- Preliminary Considerations --- p.8 / Chapter 2.1 --- Tone System of Cantonese --- p.8 / Chapter 2.2 --- Tone Classification Systems --- p.14 / Chapter 2.3 --- Design of a Speech Corpus --- p.17 / Chapter Chapter 3 --- Feature Parameters for Tone Classification --- p.22 / Chapter 3.1 --- Methodology --- p.22 / Chapter 3.2 --- Endpoint Detection and Time Alignment --- p.23 / Chapter 3.3 --- Pitch --- p.27 / Chapter 3.3.1 --- Pitch Profile Extraction --- p.28 / Chapter 3.3.2 --- Evaluation of Pitch Profile --- p.33 / Chapter 3.3.3 --- Feature Parameters Derived from Pitch Profile --- p.40 / Chapter 3.4 --- Duration --- p.46 / Chapter 3.5 --- Energy --- p.49 / Chapter 3.5.1 --- Energy Profile Extraction --- p.49 / Chapter 3.5.2 --- Evaluation of Energy Profile --- p.50 / Chapter 3.6 --- Summary --- p.54 / Chapter Chapter 4 --- Implementation of the Tone Classification System --- p.56 / Chapter 4.1 --- Intrinsic Pitch Estimation --- p.59 / Chapter 4.2 --- The Classifier --- p.63 / Chapter 4.2.1 --- Neural Network --- p.64 / Chapter 4.2.2 --- Post-Processing Unit --- p.74 / Chapter Chapter 5 --- Performance Evaluation on the Tone Classification System --- p.76 / Chapter 5.1 --- Single Speaker Tone Classification --- p.77 / Chapter 5.2 --- Multi-Speaker and Speaker Independent Tone Classification --- p.82 / Chapter 5.2.1 --- Classification with no Phonetic Information --- p.83 / Chapter 5.2.2 --- Classification with Known Final Consonants --- p.88 / Chapter 5.3 --- Confidence Improvement of the Recognition Results --- p.95 / Chapter 5.4 --- Summary --- p.101 / Chapter Chapter 6 --- Conclusions and Discussions --- p.102 / References --- p.106 / Chapter Appendix A --- Vocabulary of the Speech Corpus --- p.A1-A4 / Chapter Appendix B --- Statistics of the Pitch Profiles --- p.Bl-Bl5 / Chapter Appendix C --- Statistics of the Energy Profiles --- p.Cl-Cl1 / RESULTS
425

Automatic recognition of isolated Cantonese syllables using neural networks =: 利用神經網絡識別粤語單音節. / 利用神經網絡識別粤語單音節 / Automatic recognition of isolated Cantonese syllables using neural networks =: Li yong shen jing wang luo shi bie yue yu dan yin jie. / Li yong shen jing wang luo shi bie yue yu dan yin jie

January 1996 (has links)
by Tan Lee. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references. / by Tan Lee. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Conventional Pattern Recognition Approaches for Speech Recognition --- p.3 / Chapter 1.2 --- A Review on Neural Network Applications in Speech Recognition --- p.6 / Chapter 1.2.1 --- Static Pattern Classification --- p.7 / Chapter 1.2.2 --- Hybrid Approaches --- p.9 / Chapter 1.2.3 --- Dynamic Neural Networks --- p.12 / Chapter 1.3 --- Automatic Recognition of Cantonese Speech --- p.16 / Chapter 1.4 --- Organization of the Thesis --- p.18 / References --- p.20 / Chapter 2 --- Phonological and Acoustical Properties of Cantonese Syllables --- p.29 / Chapter 2.1 --- Phonology of Cantonese --- p.29 / Chapter 2.1.1 --- Basic Phonetic Units --- p.30 / Chapter 2.1.2 --- Syllabic Structure --- p.32 / Chapter 2.1.3 --- Lexical Tones --- p.33 / Chapter 2.2 --- Acoustical Properties of Cantonese Syllables --- p.35 / Chapter 2.2.1 --- Spectral Features --- p.35 / Chapter 2.2.2 --- Energy and Zero-Crossing Rate --- p.39 / Chapter 2.2.3 --- Pitch --- p.40 / Chapter 2.2.4 --- Duration --- p.41 / Chapter 2.3 --- Acoustic Feature Extraction for Speech Recognition of Cantonese --- p.42 / References --- p.46 / Chapter 3 --- Tone Recognition of Isolated Cantonese Syllables --- p.48 / Chapter 3.1 --- Acoustic Pre-processing --- p.48 / Chapter 3.1.1 --- Voiced Portion Detection --- p.48 / Chapter 3.1.2 --- Pitch Extraction --- p.51 / Chapter 3.2 --- Supra-Segmental Feature Parameters for Tone Recognition --- p.53 / Chapter 3.2.1 --- Pitch-Related Feature Parameters --- p.53 / Chapter 3.2.2 --- Duration and Energy Drop Rate --- p.55 / Chapter 3.2.3 --- Normalization of Feature Parameters --- p.57 / Chapter 3.3 --- An MLP Based Tone Classifier --- p.58 / Chapter 3.4 --- Simulation Experiments --- p.59 / Chapter 3.4.1 --- Speech Data --- p.59 / Chapter 3.4.2 --- Feature Extraction and Normalization --- p.61 / Chapter 3.4.3 --- Experimental Results --- p.61 / Chapter 3.5 --- Discussion and Conclusion --- p.64 / References --- p.65 / Chapter 4 --- Recurrent Neural Network Based Dynamic Speech Models --- p.67 / Chapter 4.1 --- Motivations and Rationales --- p.68 / Chapter 4.2 --- RNN Speech Model (RSM) --- p.71 / Chapter 4.2.1 --- Network Architecture and Dynamic Operation --- p.71 / Chapter 4.2.2 --- RNN for Speech Modeling --- p.72 / Chapter 4.2.3 --- Illustrative Examples --- p.75 / Chapter 4.3 --- Training of RNN Speech Models --- p.78 / Chapter 4.3.1 --- Real-Time-Recurrent-Learning (RTRL) Algorithm --- p.78 / Chapter 4.3.2 --- Iterative Re-segmentation Training of RSM --- p.80 / Chapter 4.4 --- Several Practical Issues in RSM Training --- p.85 / Chapter 4.4.1 --- Combining Adjacent Segments --- p.85 / Chapter 4.4.2 --- Hypothesizing Initial Segmentation --- p.86 / Chapter 4.4.3 --- Improving Temporal State Dependency --- p.89 / Chapter 4.5 --- Simulation Experiments --- p.90 / Chapter 4.5.1 --- Experiment 4.1 - Training with a Single Utterance --- p.91 / Chapter 4.5.2 --- Experiment 4.2 - Effect of Augmenting Recurrent Learning Rate --- p.93 / Chapter 4.5.3 --- Experiment 4.3 - Training with Multiple Utterances --- p.96 / Chapter 4.5.4 --- Experiment 4.4 一 Modeling Performance of RSMs --- p.99 / Chapter 4.6 --- Conclusion --- p.104 / References --- p.106 / Chapter 5 --- Isolated Word Recognition Using RNN Speech Models --- p.107 / Chapter 5.1 --- A Baseline System --- p.107 / Chapter 5.1.1 --- System Description --- p.107 / Chapter 5.1.2 --- Simulation Experiments --- p.110 / Chapter 5.1.3 --- Discussion --- p.117 / Chapter 5.2 --- Incorporating Duration Information --- p.118 / Chapter 5.2.1 --- Duration Screening --- p.118 / Chapter 5.2.2 --- Determination of Duration Bounds --- p.120 / Chapter 5.2.3 --- Simulation Experiments --- p.120 / Chapter 5.2.4 --- Discussion --- p.124 / Chapter 5.3 --- Discriminative Training --- p.125 / Chapter 5.3.1 --- The Minimum Classification Error Formulation --- p.126 / Chapter 5.3.2 --- Generalized Probabilistic Descent Algorithm --- p.127 / Chapter 5.3.3 --- Determination of Training Parameters --- p.128 / Chapter 5.3.4 --- Simulation Experiments --- p.129 / Chapter 5.3.5 --- Discussion --- p.133 / Chapter 5.4 --- Conclusion --- p.134 / References --- p.135 / Chapter 6 --- An Integrated Speech Recognition System for Cantonese Syllables --- p.137 / Chapter 6.1 --- System Architecture and Recognition Scheme --- p.137 / Chapter 6.2 --- Speech Corpus and Data Pre-processing --- p.140 / Chapter 6.3 --- Recognition Experiments and Results --- p.140 / Chapter 6.4 --- Discussion and Conclusion --- p.144 / References --- p.146 / Chapter 7 --- Conclusions and Suggestions for Future Work --- p.147 / Chapter 7.1 --- Conclusions --- p.147 / Chapter 7.2 --- Suggestions for Future Work --- p.151
426

Attributes and extraction of tone information for continuous Cantonese speech recognition. / 連續粤語語音辨識裏的音調提取和音調特性 / Attributes and extraction of tone information for continuous Cantonese speech recognition. / Lian xu yue yu yu yin bian shi li de yin diao ti qu he yin diao te xing

January 2000 (has links)
Lau Wai = 連續粤語語音辨識裏的音調提取和音調特性 / 劉偉. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Lau Wai = Lian xu yue yu yu yin bian shi li de yin diao ti qu he yin diao te xing / Liu Wei. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Speech Recognition of Chinese --- p.3 / Chapter 1.2 --- Tone Recognition --- p.3 / Chapter 1.3 --- Use of Tone Information in Speech Recognition --- p.4 / Chapter 1.4 --- Thesis Objectives --- p.6 / Chapter 1.5 --- Organization of the Thesis --- p.6 / Reference --- p.8 / Chapter 2 --- Properties of Tones in Cantonese --- p.12 / Chapter 2.1 --- The Cantonese Dialect --- p.12 / Chapter 2.1.1 --- "INITIAL, FINAL & TONE" --- p.13 / Chapter 2.1.2 --- Phonological Constraints --- p.16 / Chapter 2.2 --- Tones in Cantonese --- p.17 / Chapter 2.2.1 --- Linguistic Significance --- p.17 / Chapter 2.2.2 --- Acoustic properties --- p.18 / Chapter 2.2.3 --- Discriminative Features of the Cantonese Tones --- p.20 / Chapter 2.3 --- Summary --- p.21 / Reference --- p.22 / Chapter 3 --- Extraction of Tone Features --- p.23 / Chapter 3.1 --- Feature Parameters for Tone Recognition --- p.23 / Chapter 3.1.1 --- F0 Features --- p.23 / Chapter 3.1.2 --- Energy Features --- p.24 / Chapter 3.1.3 --- Log Scale vs. Linear Scale --- p.25 / Chapter 3.2 --- Detection of Voiced Speech --- p.26 / Chapter 3.3 --- Robust Algorithm for Pitch Tracking --- p.27 / Chapter 3.3.1 --- Generation of Period Candidates --- p.27 / Chapter 3.3.2 --- Post-processing --- p.28 / Chapter 3.4 --- Normalization of Fundamental Frequency --- p.29 / Chapter 3.4.1 --- Derivation of the normalization factor --- p.31 / Chapter 3.4.2 --- Moving-Window Normalization --- p.32 / Chapter 3.4.3 --- Energy Normalization --- p.35 / Chapter 3.5 --- FO Smoothing --- p.36 / Chapter 3.6 --- Generation of Tone Feature Vectors --- p.37 / Chapter 3.7 --- Summary --- p.39 / Reference --- p.40 / Chapter 4 --- Tone Recognition using Hidden Markov Models --- p.43 / Chapter 4.1 --- Two Methods of Tone Modeling --- p.43 / Chapter 4.2 --- Hidden Markov Models for Speech Recognition --- p.44 / Chapter 4.3 --- Tone Modeling by HMM --- p.47 / Chapter 4.4 --- Context-Dependent Tone Models --- p.48 / Chapter 4.5 --- Baseline Experiments --- p.49 / Chapter 4.5.1 --- The Speech Database - CUSENT´ёØ --- p.49 / Chapter 4.5.2 --- Data Pre-Processing --- p.50 / Chapter 4.5.3 --- Performance of Context-Independent Models --- p.51 / Chapter 4.5.4 --- Context-Dependent Tone Modeling --- p.52 / Chapter 4.6 --- Experiments on Moving-window FO Normalization --- p.54 / Chapter 4.6.1 --- Symmetric window --- p.54 / Chapter 4.6.2 --- Asymmetric window --- p.55 / Chapter 4.6.3 --- Energy normalization --- p.58 / Chapter 4.7 --- Incorporation of Statistical Tone Information --- p.58 / Chapter 4.8 --- Discussions --- p.59 / Chapter 4.9 --- Summary --- p.60 / Reference --- p.61 / Chapter 5 --- Integration of Tone Informaton into LVCSR for Cantonese --- p.63 / Chapter 5.1 --- The Goal --- p.63 / Chapter 5.2 --- N-best Based Integration --- p.64 / Chapter 5.2.1 --- Base Syllable Recognition --- p.65 / Chapter 5.2.2 --- Tone Recognition --- p.66 / Chapter 5.2.3 --- Language Models --- p.66 / Chapter 5.2.4 --- Integration and N-best Re-scoring --- p.66 / Chapter 5.2.5 --- Experimental Results --- p.67 / Chapter 5.2.6 --- Integration with Perfect Tone Information --- p.68 / Chapter 5.3 --- Broad Tone Classes --- p.68 / Chapter 5.3.1 --- Experimental Results --- p.70 / Chapter 5.3.2 --- Error analyses and Discussions --- p.71 / Chapter 5.4 --- Lattice Based Integration --- p.73 / Chapter 5.4.1 --- Lattice Expansion --- p.74 / Chapter 5.4.2 --- Experiments on Lattice Based Integration --- p.76 / Chapter 5.5 --- Discussions --- p.78 / Chapter 5.6 --- Summary --- p.79 / Reference --- p.80 / Chapter 6 --- Conclusions and Future Work --- p.81 / Chapter 6.1 --- Conclusions --- p.81 / Chapter 6.2 --- Suggestions for Future Work --- p.84 / Reference --- p.85
427

Acoustic units for Mandarin Chinese speech recognition =: 漢語語音識別中聲學單元的選擇. / 漢語語音識別中聲學單元的選擇 / Acoustic units for Mandarin Chinese speech recognition =: Han yu yu yin shi bie zhong sheng xue dan yuan de xuan ze. / Han yu yu yin shi bie zhong sheng xue dan yuan de xuan ze

January 1999 (has links)
by Choy Chi Yan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 111-115). / Text in English; abstract also in Chinese. / by Choy Chi Yan. / ABSTRACT --- p.I / ACKNOWLEDGMENTS --- p.III / TABLE OF CONTENTS --- p.IV / LIST OF FIGURES --- p.VII / LIST OF TABLES --- p.VIII / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Speech Recognition --- p.1 / Chapter 1.2 --- Development of Speech Recognisers --- p.4 / Chapter 1.3 --- Speech Recognition for Chinese Language --- p.5 / Chapter 1.4 --- Objectives of the thesis --- p.6 / Chapter 1.5 --- Thesis Structure --- p.7 / Chapter 2. --- PHONOLOGICAL AND ACOUSTICAL PROPERTIES OF MANDARIN CHINESE --- p.9 / Chapter 2.1 --- Characteristics of Mandarin Chinese --- p.9 / Chapter 2.1.1 --- Syllabic Structures --- p.10 / Chapter 2.1.2 --- Lexical Tones --- p.11 / Chapter 2.2 --- Basic Phonetic Units for Mandarin Chinese --- p.14 / Chapter 2.2.1 --- Tonal Syllables and Base Syllables --- p.14 / Chapter 2.2.2 --- Initial-Finals --- p.14 / Chapter 2.2.3 --- Phones --- p.16 / Chapter 2.2.4 --- Preme-Core-Finals and Preme-Tonemes --- p.17 / Chapter 2.2.5 --- Summary-The phonological hierarchy of Mandarin Syllables --- p.19 / Chapter 3. --- HIDDEN MARKOV MODELS --- p.20 / Chapter 3.1 --- Introduction --- p.20 / Chapter 3.1.1 --- Speech Data --- p.20 / Chapter 3.1.2 --- Fundamental of HMMs --- p.21 / Chapter 3.2 --- Using Hidden Markov Models for Speech Recognition --- p.22 / Chapter 3.2.1 --- Likelihood of the state sequence of speech observations --- p.22 / Chapter 3.2.2 --- The Recognition Problem --- p.24 / Chapter 3.3 --- Output Probability Distributions --- p.25 / Chapter 3.4 --- Model Training --- p.26 / Chapter 3.4.1 --- State Sequence Estimation --- p.26 / Chapter 3.4.2 --- Gaussian Mixture Models --- p.29 / Chapter 3.4.3 --- Parameter Estimation --- p.30 / Chapter 3.5 --- Speech Recognition and Viterbi Decoding --- p.31 / Chapter 3.6 --- Summary --- p.32 / Chapter 4. --- LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION FOR MANDARIN CHINESE --- p.33 / Chapter 4.1 --- Introduction --- p.33 / Chapter 4.2 --- Large Vocabulary Mandarin Chinese Recognition System --- p.34 / Chapter 4.2.1 --- Overall Architecture for the Speech Recogniser --- p.34 / Chapter 4.2.2 --- Signal Representation and Features --- p.36 / Chapter 4.2.3 --- Subword Unit Models Based on HMMs --- p.39 / Chapter 4.2.4 --- Training of Subword Units --- p.42 / Chapter 4.2.5 --- Language Model (LM) --- p.43 / Chapter 4.2.6 --- "Transcriptions, Word Networks and Dictionaries for LVCSR System" --- p.44 / Chapter 4.2.7 --- Viterbi Decoding --- p.47 / Chapter 4.2.8 --- Performance Analysis --- p.48 / Chapter 4.3 --- Experiments --- p.48 / Chapter 4.3.1 --- Tasks --- p.48 / Chapter 4.3.2 --- Speech Database --- p.49 / Chapter 4.3.3 --- Baseline Experimental Results --- p.51 / Chapter 4.4 --- Context Dependency in Speech --- p.52 / Chapter 4.4.1 --- Introduction --- p.52 / Chapter 4.4.2 --- Context Dependent Phonetic Models --- p.53 / Chapter 4.4.3 --- Word Boundaries and Word network for context-dependent HMMs --- p.54 / Chapter 4.4.4 --- Recognition Results Using Cross-Syllable Context-Dependent Units --- p.56 / Chapter 4.5 --- Tree-Based Clustering --- p.58 / Chapter 4.5.1 --- Introduction --- p.58 / Chapter 4.5.2 --- Decision Tree Based Clustering --- p.59 / Chapter 4.5.3 --- The Question Sets --- p.61 / Chapter 4.5.4 --- Convergence Condition --- p.61 / Chapter 4.4.5 --- The Final Results --- p.63 / Chapter 4.6 --- Conclusions --- p.65 / Chapter 5. --- APPLICATION1 ISOLATED WORD RECOGNITION FOR MANDARIN CHINESE --- p.67 / Chapter 5.1 --- Introduction --- p.67 / Chapter 5.2 --- Isolated Word Recogniser --- p.68 / Chapter 5.2.1 --- System Description --- p.68 / Chapter 5.2.2 --- Experimental Results --- p.70 / Chapter 5.3 --- Discussions and Conclusions --- p.71 / Chapter 6. --- APPLICATION2 SUBWORD UNITS FOR A MANDARIN KEYWORD SPOTTING SYSTEM --- p.74 / Chapter 6.1 --- INTRODUCTION --- p.74 / Chapter 6.2 --- RECOGNITION SYSTEM DESCRIPTION --- p.75 / Chapter 6.2.1 --- Overall Architecture and Recognition Network for the keyword Spotters --- p.75 / Chapter 6.2.2 --- Signal Representation and Features --- p.76 / Chapter 6.2.3 --- Keyword Models --- p.76 / Chapter 6.2.4 --- Filler Models --- p.77 / Chapter 6.2.5 --- Language Modeling and Search --- p.78 / Chapter 6.3 --- EXPERIMENTS --- p.78 / Chapter 6.3.1 --- Tasks --- p.78 / Chapter 6.3.2 --- Speech Database --- p.79 / Chapter 6.3.3 --- Performance Measures --- p.80 / Chapter 6.3.4 --- Details of Different Word-spotters --- p.80 / Chapter 6.3.5 --- General Filler Models --- p.81 / Chapter 6.4 --- EXPERIMENTAL RESULTS --- p.83 / Chapter 6.5 --- CONCLUSIONS --- p.84 / Chapter 7. --- CONCLUSIONS --- p.87 / Chapter 7.1 --- Review of the Work --- p.87 / Chapter 7.1.1 --- Large Vocabulary Continuous Speech Recognition for Mandarin Chinese --- p.87 / Chapter 7.1.2 --- Isolated Word Recognition for a Stock Inquiry Application --- p.88 / Chapter 7.1.3 --- Keyword Spotting for Mandarin Chinese --- p.89 / Chapter 7.2 --- Suggestions for Further Work --- p.89 / Chapter 7.3 --- Conclusion --- p.91 / APPENDIX --- p.92 / BIBLIOGRAPHY --- p.111
428

Audio search of surveillance data using keyword spotting and dynamic models =: 利用關鍵詞及動態模型進行的語音情報搜尋. / 利用關鍵詞及動態模型進行的語音情報搜尋 / Audio search of surveillance data using keyword spotting and dynamic models =: Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xun. / Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xun

January 2001 (has links)
Lam Hiu Sing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Lam Hiu Sing. / Acknowledgement --- p.5 / Chapter I. --- Table of Content --- p.6 / Chapter II. --- Lists of Tables --- p.9 / Chapter III. --- Lists of Figures --- p.11 / Chapter Chapter 1 --- Introduction --- p.13 / Chapter 1.1 --- Intelligence gathering by surveillance --- p.13 / Chapter 1.2 --- Speech recognition and keyword spotting --- p.16 / Chapter 1.3 --- Audio indexing and searching --- p.18 / Chapter 1.3.1 --- Nature of audio sources --- p.18 / Chapter 1.3.2 --- Different searching objectives --- p.19 / Chapter 1.4 --- Objective of thesis --- p.22 / Chapter 1.5 --- Thesis outline --- p.23 / Chapter 1.6 --- References --- p.24 / Chapter Chapter 2 --- HMM-based Keyword Spotting System --- p.28 / Chapter 2.1 --- Statistical speech model --- p.28 / Chapter 2.1.1 --- Speech signal representations --- p.29 / Chapter 2.1.2 --- Acoustic modeling --- p.29 / Chapter 2.1.3 --- HMM message generation model --- p.32 / Chapter 2.2 --- Basics of keyword spotting --- p.34 / Chapter 2.2.1 --- Keyword and non-keyword modeling --- p.34 / Chapter 2.2.2 --- Language model --- p.37 / Chapter 2.2.3 --- Performance measure --- p.38 / Chapter 2.3 --- Keyword spotting applications --- p.39 / Chapter 2.3.1 --- Information query system --- p.40 / Chapter 2.3.2 --- Topic identification system --- p.41 / Chapter 2.3.3 --- Audio indexing and searching system --- p.42 / Chapter 2.3.4 --- Lexicon learning system --- p.42 / Chapter 2.4 --- Summary --- p.43 / Chapter 2.5 --- References --- p.44 / Chapter Chapter 3 --- Cantonese Characteristics --- p.49 / Chapter 3.1 --- Cantonese Dialect --- p.49 / Chapter 3.2 --- Phonological properties of Cantonese --- p.51 / Chapter 3.2.1 --- Initials and finals of Cantonese --- p.51 / Chapter 3.2.2 --- Tones of Cantonese --- p.54 / Chapter 3.3 --- Summary --- p.55 / Chapter 3.4 --- References --- p.55 / Chapter Chapter 4 --- System Configuration for Audio Search of Surveillance Data --- p.57 / Chapter 4.1 --- Audio Search of Surveillance Data --- p.57 / Chapter 4.2 --- Requirements and Specifications of the Proposed Audio Search System --- p.59 / Chapter 4.3 --- Proposed Audio Search System Architecture --- p.62 / Chapter 4.4 --- Summary --- p.65 / Chapter 4.5 --- References --- p.66 / Chapter Chapter 5 --- Development of a Keyword Spotting based Audio Indexing and Searching System --- p.67 / Chapter 5.1 --- Acoustic Models for Keywords and Fillers --- p.67 / Chapter 5.2 --- Adaptation mechanism --- p.76 / Chapter 5.2.1 --- Adaptation techniques --- p.76 / Chapter 5.2.2 --- Adaptation strategy for MLLR --- p.85 / Chapter 5.3 --- Language model --- p.86 / Chapter 5.4 --- Summary --- p.88 / Chapter 5.5 --- References --- p.88 / Chapter Chapter 6 --- System Evaluations --- p.93 / Chapter 6.1 --- Data for training and evaluation of the system --- p.94 / Chapter 6.1.1 --- Training Data --- p.94 / Chapter 6.1.2 --- Evaluation data --- p.95 / Chapter 6.1.3 --- Performance measure --- p.97 / Chapter 6.2 --- Cluster settings for MLLR adaptation --- p.98 / Chapter 6.3 --- Effects of word insertion penalty --- p.102 / Chapter 6.4 --- Acoustic modeling performance comparisons --- p.103 / Chapter 6.4.1 --- System robustness test --- p.104 / Chapter 6.4.2 --- The performance limit --- p.105 / Chapter 6.5 --- Overall System Performance --- p.107 / Chapter 6.6 --- Summary --- p.108 / Chapter 6.7 --- References --- p.108 / Chapter Chapter 7 --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Future works --- p.110 / Chapter 7.2.1 --- Discriminative adaptation --- p.110 / Chapter 7.2.2 --- Pronunciation dictionary --- p.111 / Chapter 7.2.3 --- Channel effect --- p.111
429

The use of subword-based audio indexing in Chinese spoken document retrieval.

January 2001 (has links)
Li Yuk Chi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves [112]-119). / Abstracts in English and Chinese. / Abstract --- p.2 / List of Figures --- p.8 / List of Tables --- p.12 / Chapter 1 --- Introduction --- p.17 / Chapter 1.1 --- Information Retrieval --- p.18 / Chapter 1.1.1 --- Information Retrieval Models --- p.19 / Chapter 1.1.2 --- Information Retrieval in English --- p.20 / Chapter 1.1.3 --- Information Retrieval in Chinese --- p.22 / Chapter 1.2 --- Spoken Document Retrieval --- p.24 / Chapter 1.2.1 --- Spoken Document Retrieval in English --- p.25 / Chapter 1.2.2 --- Spoken Document Retrieval in Chinese --- p.25 / Chapter 1.3 --- Previous Work --- p.28 / Chapter 1.4 --- Motivation --- p.32 / Chapter 1.5 --- Goals --- p.33 / Chapter 1.6 --- Thesis Organization --- p.34 / Chapter 2 --- Investigation Framework --- p.35 / Chapter 2.1 --- Indexing the Spoken Document Collection --- p.36 / Chapter 2.2 --- Query Processing --- p.37 / Chapter 2.3 --- Subword Indexing --- p.37 / Chapter 2.4 --- Robustness in Chinese Spoken Document Retrieval --- p.40 / Chapter 2.5 --- Retrieval --- p.40 / Chapter 2.6 --- Evaluation --- p.43 / Chapter 2.6.1 --- Average Inverse Rank --- p.43 / Chapter 2.6.2 --- Mean Average Precision --- p.44 / Chapter 3 --- Subword-based Chinese Spoken Document Retrieval --- p.46 / Chapter 3.1 --- The Cantonese Corpus --- p.48 / Chapter 3.2 --- Known-Item Retrieval --- p.49 / Chapter 3.3 --- Subword Formulation for Cantonese Spoken Document Retrieval --- p.50 / Chapter 3.4 --- Audio Indexing by Cantonese Speech Recognition --- p.52 / Chapter 3.4.1 --- Seed Models from Adapted Data --- p.52 / Chapter 3.4.2 --- Retraining Acoustic Models --- p.53 / Chapter 3.5 --- The Retrieval Model --- p.55 / Chapter 3.6 --- Experiments --- p.56 / Chapter 3.6.1 --- Setup and Observations --- p.57 / Chapter 3.6.2 --- Results Analysis --- p.58 / Chapter 3.7 --- Chapter Summary --- p.63 / Chapter 4 --- Robust Indexing and Retrieval Methods --- p.64 / Chapter 4.1 --- Query Expansion using Phonetic Confusion --- p.65 / Chapter 4.1.1 --- Syllable-Syllable Confusions from Recognition --- p.66 / Chapter 4.1.2 --- Experimental Setup and Observation --- p.67 / Chapter 4.2 --- Document Expansion --- p.71 / Chapter 4.2.1 --- The Side Collection for Expansion --- p.72 / Chapter 4.2.2 --- Detailed Procedures in Document Expansion --- p.72 / Chapter 4.2.3 --- Improvements due to Document Expansion --- p.73 / Chapter 4.3 --- Using both Query and Document Expansion --- p.75 / Chapter 4.4 --- Chapter Summary --- p.76 / Chapter 5 --- Cross-Language Spoken Document Retrieval --- p.78 / Chapter 5.1 --- The Topic Detection and Tracking Collection --- p.80 / Chapter 5.1.1 --- The Spoken Document Collection --- p.81 / Chapter 5.1.2 --- The Translingual Query --- p.82 / Chapter 5.1.3 --- The Side Collection --- p.82 / Chapter 5.1.4 --- Subword-based Indexing --- p.83 / Chapter 5.2 --- The Translingual Retrieval Task --- p.83 / Chapter 5.3 --- Machine Translated Query --- p.85 / Chapter 5.3.1 --- The Unbalanced Query --- p.85 / Chapter 5.3.2 --- The Balanced Query --- p.87 / Chapter 5.3.3 --- Results on the Weight Balancing Scheme --- p.88 / Chapter 5.4 --- Document Expansion from a Side Collection --- p.89 / Chapter 5.5 --- Performance Evaluation and Analysis --- p.91 / Chapter 5.6 --- Chapter Summary --- p.93 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Future Directions --- p.97 / Chapter A --- Input format for the IR engine --- p.101 / Chapter B --- Preliminary Results on the Two Normalization Schemes --- p.102 / Chapter C --- Significance Tests --- p.103 / Chapter C.1 --- Query Expansions for Cantonese Spoken Document Retrieval --- p.103 / Chapter C.2 --- Document Expansion for Cantonese Spoken Document Retrieval --- p.105 / Chapter C.3 --- Balanced Query for Cross-Language Spoken Document Retrieval --- p.107 / Chapter C.4 --- Document Expansion for Cross-Language Spoken Document Retrieval --- p.107 / Chapter D --- The Use of an Unrelated Source for Expanding Spoken Doc- uments in Cantonese --- p.110 / Bibliography --- p.110
430

A Novel Non-Acoustic Voiced Speech Sensor: Experimental Results and Characterization

Keenaghan, Kevin Michael 14 January 2004 (has links)
Recovering clean speech from an audio signal with additive noise is a problem that has plagued the signal processing community for decades. One promising technique currently being utilized in speech-coding applications is a multi-sensor approach, in which a microphone is used in conjunction with optical, mechanical, and electrical non-acoustic speech sensors to provide greater versatility in signal processing algorithms. One such non-acoustic glottal waveform sensor is the Tuned Electromagnetic Resonator Collar (TERC) sensor, first developed in [BLP+02]. The sensor is based on Magnetic Resonance Imaging (MRI) concepts, and is designed to detect small changes in capacitance caused by changes to the state of the vocal cords - the glottal waveform. Although preliminary simulations in [BLP+02] have validated the basic theory governing the TERC sensor's operation, results from human subject testing are necessary to accurately characterize the sensor's performance in practice. To this end, a system was designed and developed to provide real-time audio recordings from the sensor while attached to a human test subject. From these recordings, executed in a variety of acoustic noise environments, the practical functionality of the TERC sensor was demonstrated. The sensor in its current evolution is able to detect a periodic waveform during voiced speech, with two clear harmonics and a fundamental frequency equal to that of the speech it is detecting. This waveform is representative of the glottal waveform, with little or no articulation as initially hypothesized. Though statistically significant conclusions about the sensor's immunity to environmental noise are difficult to draw, the results suggest that the TERC sensor is considerably more resistant to the effects of noise than typical acoustic sensors, making it a valuable addition to the multi-sensor speech processing approach.

Page generated in 0.1129 seconds