• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 337
  • 40
  • 24
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • 3
  • 3
  • Tagged with
  • 499
  • 499
  • 499
  • 180
  • 124
  • 98
  • 89
  • 48
  • 48
  • 42
  • 41
  • 41
  • 39
  • 38
  • 37
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Exploitation of phase and vocal excitation modulation features for robust speaker recognition. / CUHK electronic theses & dissertations collection

January 2011 (has links)
Mel-frequency cepstral coefficients (MFCCs) are widely adopted in speech recognition as well as speaker recognition applications. They are extracted to primarily characterize the spectral envelope of a quasi-stationary speech segment. It was shown that cepstral features are closely related to the linguistic content of speech. Besides the magnitude-based cepstral features, there are resources in speech, e.g, the phase and excitation source, are believed to contain useful properties for speaker discrimination. Moreover, in real situations, there are large variations exist between the development and application scenarios for a speaker recognition system. These include channel mismatch, recording apparatus mismatch, environmental variation, or even change of emotional/healthy state of speakers. As a consequence, the magnitude-based features are insufficient to provide satisfactory and robust speaker recognition accuracy. Therefore, the exploitation of complementary features with MFCCs may provide one solution to alleviate the deficiency, from a feature-based perspective. / Speaker recognition (SR) refers to the process of automatically determining or verifying the identity of a person based on his or her voice characteristics. In practical applications, a voice can be used as one of the modalities in a multimodal biometric system, or be the sole medium for identity authentication. The general area of speaker recognition encompasses two fundamental tasks: speaker identification and speaker verification. / Wang, Ning. / Adviser: Pak-Chung Ching. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 177-193). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
72

An automatic speaker recognition system.

January 1989 (has links)
by Yu Chun Kei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1989. / Bibliography: leaves 86-88.
73

Large vocabulary continuous speech recognition for cantonese. / 粤語的大詞彙、連續語音識別系統 / Large vocabulary continuous speech recognition for cantonese. / Yue yu de da ci hui, lian xu yu yin shi bie xi tong

January 2000 (has links)
Wong Yiu Wing = 粤語的大詞彙、連續語音識別系統 / 黃耀榮. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Wong Yiu Wing = Yue yu de da ci hui, lian xu yu yin shi bie xi tong / Huang Yaorong. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Progress of Large Vocabulary Continuous Speech Recognition for Chinese --- p.2 / Chapter 1.2 --- Objectives of the Thesis --- p.5 / Chapter 1.3 --- Thesis Outline --- p.6 / Reference --- p.7 / Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese --- p.9 / Chapter 2.1 --- Characteristics of Cantonese --- p.9 / Chapter 2.1.1 --- Cantonese Phonology --- p.9 / Chapter 2.1.2 --- Written Cantonese versus Spoken Cantonese --- p.12 / Chapter 2.2 --- Techniques for Large Vocabulary Continuous Speech Recognition --- p.13 / Chapter 2.2.1 --- Feature Representation of the Speech Signal --- p.14 / Chapter 2.2.2 --- Hidden Markov Model for Acoustic Modeling --- p.15 / Chapter 2.2.3 --- Search Algorithm --- p.17 / Chapter 2.2.4 --- Statistical Language Modeling --- p.18 / Chapter 2.3 --- Discussions --- p.19 / Reference --- p.20 / Chapter 3 --- Acoustic Modeling for Cantonese --- p.21 / Chapter 3.1 --- The Speech Database --- p.21 / Chapter 3.2 --- Context-Dependent Acoustic Modeling --- p.22 / Chapter 3.2.1 --- Context-Independent Initial / Final Models --- p.23 / Chapter 3.2.2 --- Construction of Context-Dependent TrilF Models from Context- Independent IF Models --- p.26 / Chapter 3.2.3 --- Data Sharing in Acoustic Modeling --- p.27 / Chapter 1. --- Sparse Data Problem --- p.27 / Chapter 2. --- Decision-Tree Based State Clustering --- p.28 / Chapter 3.3 --- Experimental Results --- p.31 / Chapter 3.4 --- Error Analysis and Discussions --- p.33 / Chapter 3.4.1 --- Recognition Accuracy vs. Model Complexity --- p.33 / Chapter 3.4.2 --- Initial / Final Confusion Matrices --- p.34 / Chapter 3.4.3 --- Analysis of Phonetic Trees --- p.39 / Chapter 3.4.4 --- The NULL Initial HMM --- p.42 / Chapter 3.4.5 --- Comments on the CUSENT Speech Corpus --- p.42 / References --- p.44 / Chapter 4 --- Language Modeling for Cantonese --- p.46 / Chapter 4.1 --- N-gram Language Model --- p.46 / Chapter 4.1.1 --- Problems in Building an N-gram Language Model --- p.47 / Chapter 1. --- The Zero-Probability Problem and Backoff N-gram --- p.48 / Chapter 4.1.2 --- Perplexity of a Language Model --- p.49 / Chapter 4.2 --- N-gram Modeling in Cantonese --- p.50 / Chapter 4.2.1 --- The Vocabulary and Word Segmentation --- p.50 / Chapter 4.2.2 --- Evaluation of Chinese Language Models --- p.53 / Chapter 4.3 --- Character-Level versus Word-Level Language Models --- p.54 / Chapter 4.4 --- Language Modeling in a Specific Domain --- p.57 / Chapter 4.4.1 --- Language Model Adaptation to the Financial Domain --- p.57 / Chapter 1. --- Vocabulary Refinement --- p.57 / Chapter 2. --- The Seed Financial Bigram --- p.58 / Chapter 3. --- Linear Interpolation of Two Bigram models --- p.59 / Chapter 4. --- Performance of the Interpolated Language Model --- p.60 / Chapter 4.5 --- Error Analysis and Discussions --- p.61 / References --- p.63 / Chapter 5 --- Integration of Acoustic Model and Language Model --- p.65 / Chapter 5.1 --- One-Pass Search versus Multi-Pass Search --- p.66 / Chapter 5.2 --- A Two-Pass Decoder for Chinese LVCSR --- p.68 / Chapter 5.2.1 --- The First Pass Search --- p.69 / Chapter 5.2.2 --- The Second Pass Search --- p.72 / Chapter 5.3 --- Experimental Results --- p.73 / Chapter 5.4 --- Error Analysis and Discussions --- p.75 / Chapter 5.4.1 --- Vocabulary and Search --- p.75 / Chapter 5.4.2 --- Expansion of the Syllable Lattice --- p.76 / Chapter 5.4.3 --- Perplexity and Recognition Accuracy --- p.78 / Reference --- p.80 / Chapter 6 --- Conclusions and Suggestions for Future Work --- p.82 / Chapter 6.1 --- Conclusions --- p.82 / Chapter 6.2 --- Suggestions for future work --- p.84 / Chapter 1. --- Speaker Adaptation --- p.84 / Chapter 2. --- Tone Recognition --- p.84 / Reference --- p.85 / Appendix I Base Syllable Table --- p.86 / Appendix II Phonetic Question Set --- p.87
74

Text-independent speaker recognition using discriminative subspace analysis. / CUHK electronic theses & dissertations collection

January 2012 (has links)
說話人識別(Speaker Recognition) 主要利用聲音來檢測說話人的身份,是一項重要且極具挑戰性的生物認證研究課題。通常來說,針對語音信號的文本內容差別,說話人識別可以分成文本相關和文本無關兩類。另外,說話人識別有兩類重要應用,第一類是說話人確認,主要是通過給定話者聲音信息對說話人聲稱之身份進行二元判定。另一類是說話人辨識,其主要是從待選說話人集中判斷未知身份信息的話者身份。 / 在先進的說話人識別系統中,每個說話人模型是通過給定的說話人數據進行特徵統計分佈估計由生成模型訓練得到。這類方法由於需要逐帧進行概率或似然度計算而得出最終判決,會耗費大量系統資源並降低實時性性能。採用子空間降維技術,我們不僅避免選取冗餘高維度數據,同時能夠有效删除於識別中無用之數據。為克服上述生成性模型的不足並獲得不同說話人間的區分邊界,本文提出了利用區分性子空間方法訓練模型並採用有效的距離測度作為最終的建模識別新算法。 / 在本篇論文中,我們將先介紹並分析各類產生性說話人識別方法,例如高斯混合模型及聯合因子分析。另外,為了降低特徵空間維度和運算時間,我們也對子空間分析技術做了調研。除此之外,我們提出了一種取名為Fishervoice 基於非參數分佈假定的新穎說話人識別框架。所提出的Fishervoice 框架的主要目的是為了降低噪聲干擾同時加重分類信息,而能夠加強在可區分性的子空間內對聲音特徵建模。採用上述Fishervoice 框架,說話人識別可以簡單地通過測試樣本映射到Fishervoice 子空間並計算其簡單歐氏距離而實現。為了更好得降低維度及提高識別率,我們還對Fishervocie 框架進行多樣化探索。另外,我們也在低維度的全變化空間(Total Variability) 對各類多種子空間分析模型進行調比較。基於XM2VTS 和NIST 公開數據庫的實驗驗證了本文提出的算法的有效性。 / Speaker Recognition (SR), which uses the voice to determine the speaker’s identity, is an important and challenging research topic for biometric authentication. Generally speaking, speaker recognition can be divided into text-dependent and text-independent methods according to the verbal content of the speech signal. There are two major applications of speaker recognition: the first is speaker verification, also referred to speaker authentication, which is used to validate the identity of a speaker according to the voice and it involves a binary decision. The second is speaker identification, which is used to determine an unknown speaker’s identity. / In a state-of-art speaker recognition system, the speaker training model is usually trained by generative methods, which estimate feature distribution of each speaker among the given data. These generative methods need a frame-based metric (e.g. probability, likelihoods) calculation for making final decision, which consumes much computer resources, slowing down the real-time responses. Meanwhile, lots of redundant data frames are blindly selected for training without efficient subspace dimension reduction. In order to overcome disadvantages of generative methods and obtain boundary information between individual speakers, we propose to apply the discriminative subspace technique for model training and employ simple but efficient distance metrics for decision score calculation. / In this thesis, we shall present an overview of both conventional and state-of-the-art generative speaker recognition methods (e.g. Gaussian Mixture Model and Joint Factor Analysis) and analyze their advantages and disadvantages. In addition, we have also made an investigation of the application of subspace analysis techniques to reduce feature dimensions and computation time. After that, a novel speaker recognition framework based on the nonparametric Fisher’s discriminant analysis which we name Fishervoice is proposed. The objective of the proposed Fishervoice algorithm is to model the intrinsic vocal characteristics in a discriminant subspace for de-emphasizing unwanted noise variations and emphasizing classification boundaries information. Using the proposed Fishervoice framework, speaker recognition can be easily realized by mapping a test utterance to the Fishervoice subspace and then calculating the score between the test utterance and its reference. Besides, we explore the proposed Fishervoice framework with several extensions for further dimensionality reduction and performance improvement. Furthermore, we investigate various subspace analysis techniques in a total variability-based low-dimensional space for fast computation. Extensive experiments on two large speaker recognition corpora (XM2VTS and NIST) demonstrate significant improvements of Fishervoice over standard, state-of-the-art approaches for both speaker identification and verification systems. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Jiang, Weiwu. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 127-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgements --- p.vi / Contents --- p.xiv / List of Figures --- p.xvii / List of Tables --- p.xxiii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview of Speaker Recognition Systems --- p.1 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.3 --- Outline of Thesis --- p.6 / Chapter 2 --- Background Study --- p.7 / Chapter 2.1 --- Generative Gaussian Mixture Model (GMM) --- p.7 / Chapter 2.1.1 --- Basic GMM --- p.7 / Chapter 2.1.2 --- The Gaussian Mixture Model-Universal Background Model (GMM-UBM) System --- p.9 / Chapter 2.2 --- Discriminative Subspace Analysis --- p.12 / Chapter 2.2.1 --- Principal Component Analysis --- p.12 / Chapter 2.2.2 --- Linear Discriminant Analysis --- p.16 / Chapter 2.2.3 --- Heteroscedastic Linear Discriminant Analysis --- p.17 / Chapter 2.2.4 --- Locality Preserving Projections --- p.18 / Chapter 2.3 --- Noise Compensation --- p.20 / Chapter 2.3.1 --- Eigenvoice --- p.20 / Chapter 2.3.2 --- Joint Factor Analysis --- p.24 / Chapter 2.3.3 --- Probabilistic Linear Discriminant Analysis --- p.26 / Chapter 2.3.4 --- Nuisance Attribute Projection --- p.30 / Chapter 2.3.5 --- Within-class Covariance Normalization --- p.32 / Chapter 2.4 --- Support Vector Machine --- p.33 / Chapter 2.5 --- Score Normalization --- p.35 / Chapter 2.6 --- Summary --- p.39 / Chapter 3 --- Corpora for Speaker Recognition Experiments --- p.41 / Chapter 3.1 --- Corpora for Speaker Identification Experiments --- p.41 / Chapter 3.1.1 --- XM2VTS Corpus --- p.41 / Chapter 3.1.2 --- NIST Corpora --- p.42 / Chapter 3.2 --- Corpora for Speaker Verification Experiments --- p.45 / Chapter 3.3 --- Summary --- p.47 / Chapter 4 --- Performance Measures for Speaker Recognition --- p.48 / Chapter 4.1 --- Performance Measures for Identification --- p.48 / Chapter 4.2 --- Performance Measures for Verification --- p.49 / Chapter 4.2.1 --- Equal Error Rate --- p.49 / Chapter 4.2.2 --- Detection Error Tradeoff Curves --- p.49 / Chapter 4.2.3 --- Detection Cost Function --- p.50 / Chapter 4.3 --- Summary --- p.51 / Chapter 5 --- The Discriminant Fishervoice Framework --- p.52 / Chapter 5.1 --- The Proposed Fishervoice Framework --- p.53 / Chapter 5.1.1 --- Feature Representation --- p.53 / Chapter 5.1.2 --- Nonparametric Fisher’s Discriminant Analysis --- p.55 / Chapter 5.2 --- Speaker Identification Experiments --- p.60 / Chapter 5.2.1 --- Experiments on the XM2VTS Corpus --- p.60 / Chapter 5.2.2 --- Experiments on the NIST Corpus --- p.62 / Chapter 5.3 --- Summary --- p.64 / Chapter 6 --- Extension of the Fishervoice Framework --- p.66 / Chapter 6.1 --- Two-level Fishervoice Framework --- p.66 / Chapter 6.1.1 --- Proposed Algorithm --- p.66 / Chapter 6.2 --- Performance Evaluation on the Two-level Fishervoice Framework --- p.70 / Chapter 6.2.1 --- Experimental Setup --- p.70 / Chapter 6.2.2 --- Performance Comparison of Different Types of Input Supervectors --- p.72 / Chapter 6.2.3 --- Performance Comparison of Different Numbers of Slices --- p.73 / Chapter 6.2.4 --- Performance Comparison of Different Dimensions of Fishervoice Projection Matrices --- p.75 / Chapter 6.2.5 --- Performance Comparison with Other Systems --- p.77 / Chapter 6.2.6 --- Fusion with Other Systems --- p.78 / Chapter 6.2.7 --- Extension of the Two-level Subspace Analysis Framework --- p.80 / Chapter 6.3 --- Random Subspace Sampling Framework --- p.81 / Chapter 6.3.1 --- Supervector Extraction --- p.82 / Chapter 6.3.2 --- Training Stage --- p.83 / Chapter 6.3.3 --- Testing Procedures --- p.84 / Chapter 6.3.4 --- Discussion --- p.84 / Chapter 6.4 --- Performance Evaluation of the Random Subspace Sampling Framework --- p.85 / Chapter 6.4.1 --- Experimental Setup --- p.85 / Chapter 6.4.2 --- Random Subspace Sampling Analysis --- p.87 / Chapter 6.4.3 --- Comparison with Other Systems --- p.90 / Chapter 6.4.4 --- Fusion with the Other Systems --- p.90 / Chapter 6.5 --- Summary --- p.92 / Chapter 7 --- Discriminative Modeling in Low-dimensional Space --- p.94 / Chapter 7.1 --- Discriminative Subspace Analysis in Low-dimensional Space --- p.95 / Chapter 7.1.1 --- Experimental Setup --- p.96 / Chapter 7.1.2 --- Performance Evaluation on Individual Subspace Analysis Techniques --- p.98 / Chapter 7.1.3 --- Performance Evaluation on Multi-type of Subspace Analysis Techniques --- p.105 / Chapter 7.2 --- Discriminative Subspace Analysis with Support Vector Machine --- p.115 / Chapter 7.2.1 --- Experimental Setup --- p.116 / Chapter 7.2.2 --- Performance Evaluation on LDA+WCCN+SVM --- p.117 / Chapter 7.2.3 --- Performance Evaluation on Fishervoice+SVM --- p.118 / Chapter 7.3 --- Summary --- p.118 / Chapter 8 --- Conclusions and Future Work --- p.120 / Chapter 8.1 --- Contributions --- p.120 / Chapter 8.2 --- Future Directions --- p.121 / Chapter A --- EM Training GMM --- p.123 / Bibliography --- p.127
75

Speech recognition on DSP: algorithm optimization and performance analysis.

January 2004 (has links)
Yuan Meng. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 85-91). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- History of ASR development --- p.2 / Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3 / Chapter 1.2.1 --- Classification of ASR systems --- p.3 / Chapter 1.2.2 --- Automatic speech recognition process --- p.4 / Chapter 1.3 --- Performance measurements of ASR --- p.7 / Chapter 1.3.1 --- Recognition accuracy --- p.7 / Chapter 1.3.2 --- Complexity --- p.7 / Chapter 1.3.3 --- Robustness --- p.8 / Chapter 1.4 --- Motivation and goal of this work --- p.8 / Chapter 1.5 --- Thesis outline --- p.10 / Chapter 2 --- Signal processing techniques for front-end --- p.12 / Chapter 2.1 --- Basic feature extraction principles --- p.13 / Chapter 2.1.1 --- Pre-emphasis --- p.13 / Chapter 2.1.2 --- Frame blocking and windowing --- p.13 / Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15 / Chapter 2.1.4 --- Spectral magnitudes --- p.15 / Chapter 2.1.5 --- Mel-frequency filterbank --- p.16 / Chapter 2.1.6 --- Logarithm of filter energies --- p.18 / Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18 / Chapter 2.1.8 --- Cepstral Weighting --- p.19 / Chapter 2.1.9 --- Dynamic featuring --- p.19 / Chapter 2.2 --- Practical issues --- p.20 / Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20 / Chapter 2.2.2 --- Model of environment --- p.23 / Chapter 2.2.3 --- End-point detection (EPD) --- p.23 / Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25 / Chapter 3 --- HMM-based Acoustic Modeling --- p.26 / Chapter 3.1 --- HMMs for ASR --- p.26 / Chapter 3.2 --- Output probabilities --- p.27 / Chapter 3.3 --- Viterbi search engine --- p.29 / Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30 / Chapter 3.4.1 --- Isolated word recognition --- p.30 / Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31 / Chapter 4 --- DSP for embedded applications --- p.32 / Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32 / Chapter 4.2 --- Description of hardware platform --- p.34 / Chapter 4.3 --- I/O operation for real-time processing --- p.36 / Chapter 4.4 --- Fixed point algorithm on DSP --- p.40 / Chapter 5 --- ASR algorithm optimization --- p.42 / Chapter 5.1 --- Methodology --- p.42 / Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43 / Chapter 5.3 --- Computational complexity consideration --- p.45 / Chapter 5.3.1 --- Feature extraction techniques --- p.45 / Chapter 5.3.2 --- Viterbi search module --- p.50 / Chapter 5.4 --- Memory requirements consideration --- p.51 / Chapter 6 --- Experimental results and performance analysis --- p.53 / Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54 / Chapter 6.1.1 --- Execution time --- p.54 / Chapter 6.1.2 --- Memory requirements --- p.57 / Chapter 6.1.3 --- Recognition performance --- p.57 / Chapter 6.2 --- Connected word recognition (CWR) --- p.61 / Chapter 6.2.1 --- Execution time consideration --- p.62 / Chapter 6.2.2 --- Recognition performance --- p.62 / Chapter 6.3 --- Summary & discussion --- p.66 / Chapter 7 --- Implementation of practical techniques --- p.67 / Chapter 7.1 --- End-point detection (EPD) --- p.67 / Chapter 7.2 --- Spectral subtraction (SS) --- p.71 / Chapter 7.3 --- Experimental results --- p.72 / Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72 / Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75 / Chapter 7.4 --- Results --- p.77 / Chapter 8 --- Conclusions and future work --- p.78 / Chapter 8.1 --- Summary and Conclusions --- p.78 / Chapter 8.2 --- Suggestions for future research --- p.80 / Appendices --- p.82 / Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82 / Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84 / Bibliography --- p.85
76

Robust speech recognition under noisy environments.

January 2004 (has links)
Lee Siu Wa. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 116-121). / Abstracts in English and Chinese. / Abstract --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- An Overview on Automatic Speech Recognition --- p.2 / Chapter 1.2 --- Thesis Outline --- p.6 / Chapter 2 --- Baseline Speech Recognition System --- p.8 / Chapter 2.1 --- Baseline Speech Recognition Framework --- p.8 / Chapter 2.2 --- Acoustic Feature Extraction --- p.11 / Chapter 2.2.1 --- Speech Production and Source-Filter Model --- p.12 / Chapter 2.2.2 --- Review of Feature Representations --- p.14 / Chapter 2.2.3 --- Mel-frequency Cepstral Coefficients --- p.20 / Chapter 2.2.4 --- Energy and Dynamic Features --- p.24 / Chapter 2.3 --- Back-end Decoder --- p.26 / Chapter 2.4 --- English Digit String Corpus ´ؤ AURORA2 --- p.28 / Chapter 2.5 --- Baseline Recognition Experiment --- p.31 / Chapter 3 --- A Simple Recognition Framework with Model Selection --- p.34 / Chapter 3.1 --- Mismatch between Training and Testing Conditions --- p.34 / Chapter 3.2 --- Matched Training and Testing Conditions --- p.38 / Chapter 3.2.1 --- Noise type-Matching --- p.38 / Chapter 3.2.2 --- SNR-Matching --- p.43 / Chapter 3.2.3 --- Noise Type and SNR-Matching --- p.44 / Chapter 3.3 --- Recognition Framework with Model Selection --- p.48 / Chapter 4 --- Noise Spectral Estimation --- p.53 / Chapter 4.1 --- Introduction to Statistical Estimation Methods --- p.53 / Chapter 4.1.1 --- Conventional Estimation Methods --- p.54 / Chapter 4.1.2 --- Histogram Technique --- p.55 / Chapter 4.2 --- Quantile-based Noise Estimation (QBNE) --- p.57 / Chapter 4.2.1 --- Overview of Quantile-based Noise Estimation (QBNE) --- p.58 / Chapter 4.2.2 --- Time-Frequency Quantile-based Noise Estimation (T-F QBNE) --- p.62 / Chapter 4.2.3 --- Mainlobe-Resilient Time-Frequency Quantile-based Noise Estimation (M-R T-F QBNE) --- p.65 / Chapter 4.3 --- Estimation Performance Analysis --- p.72 / Chapter 4.4 --- Recognition Experiment with Model Selection --- p.74 / Chapter 5 --- Feature Compensation: Algorithm and Experiment --- p.81 / Chapter 5.1 --- Feature Deviation from Clean Speech --- p.81 / Chapter 5.1.1 --- Deviation in MFCC Features --- p.82 / Chapter 5.1.2 --- Implications for Feature Compensation --- p.84 / Chapter 5.2 --- Overview of Conventional Compensation Methods --- p.86 / Chapter 5.3 --- Feature Compensation by In-phase Feature Induction --- p.94 / Chapter 5.3.1 --- Motivation --- p.94 / Chapter 5.3.2 --- Methodology --- p.97 / Chapter 5.4 --- Compensation Framework for Magnitude Spectrum and Segmen- tal Energy --- p.102 / Chapter 5.5 --- Recognition -Experiments --- p.103 / Chapter 6 --- Conclusions --- p.112 / Chapter 6.1 --- Summary and Discussions --- p.112 / Chapter 6.2 --- Future Directions --- p.114 / Bibliography --- p.116
77

Verbal information verification for high-performance speaker authentication.

January 2005 (has links)
Qin Chao. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 77-82). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview of Speaker Authentication --- p.1 / Chapter 1.2 --- Goals of this Research --- p.6 / Chapter 1.3 --- Thesis Outline --- p.7 / Chapter 2 --- Speaker Verification --- p.8 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.2 --- Front-End Processing --- p.9 / Chapter 2.2.1 --- Acoustic Feature Extraction --- p.10 / Chapter 2.2.2 --- Endpoint Detection --- p.12 / Chapter 2.3 --- Speaker Modeling --- p.12 / Chapter 2.3.1 --- Likelihood Ratio Test for Speaker Verification --- p.13 / Chapter 2.3.2 --- Gaussian Mixture Models --- p.15 / Chapter 2.3.3 --- UBM Adaptation --- p.16 / Chapter 2.4 --- Experiments on Cantonese Speaker Verification --- p.18 / Chapter 2.4.1 --- Speech Databases --- p.19 / Chapter 2.4.2 --- Effect of Endpoint Detection --- p.21 / Chapter 2.4.3 --- Comparison of the UBM Adaptation and the Cohort Method --- p.22 / Chapter 2.4.4 --- Discussions --- p.25 / Chapter 2.5 --- Summary --- p.26 / Chapter 3 --- Verbal Information Verification --- p.28 / Chapter 3.1 --- Introduction --- p.28 / Chapter 3.2 --- Utterance Verification for VIV --- p.29 / Chapter 3.2.1 --- Forced Alignment --- p.30 / Chapter 3.2.2 --- Subword Hypothesis Test --- p.30 / Chapter 3.2.3 --- Confidence Measure --- p.31 / Chapter 3.3 --- Sequential Utterance Verification for VIV --- p.34 / Chapter 3.3.1 --- Practical Security Consideration --- p.34 / Chapter 3.3.2 --- Robust Interval --- p.34 / Chapter 3.4 --- Application and Further Improvement --- p.36 / Chapter 3.5 --- Summary --- p.36 / Chapter 4 --- Model Design for Cantonese Verbal Information Verification --- p.37 / Chapter 4.1 --- General Considerations --- p.37 / Chapter 4.2 --- The Cantonese Dialect --- p.37 / Chapter 4.3 --- Target Model Design --- p.38 / Chapter 4.4 --- Anti-Model Design --- p.38 / Chapter 4.4.1 --- Role of Normalization Techniques --- p.38 / Chapter 4.4.2 --- Context-dependent versus Context-independent Antimodels --- p.40 / Chapter 4.4.3 --- General Approach to CI Anti-modeling --- p.40 / Chapter 4.4.4 --- Sub-syllable Clustering --- p.41 / Chapter 4.4.5 --- Cohort and World Anti-models --- p.42 / Chapter 4.4.6 --- GMM-based Anti-models --- p.44 / Chapter 4.5 --- Simulation Results and Discussions --- p.45 / Chapter 4.5.1 --- Speech Databases --- p.45 / Chapter 4.5.2 --- Effect of Model Complexity --- p.46 / Chapter 4.5.3 --- Comparisons among different Anti-models --- p.47 / Chapter 4.5.4 --- Discussions --- p.48 / Chapter 4.6 --- Summary --- p.49 / Chapter 5 --- Integration of SV and VIV --- p.50 / Chapter 5.1 --- Introduction --- p.50 / Chapter 5.2 --- Voting Method --- p.53 / Chapter 5.2.1 --- Permissive Test vs. Restrictive Test --- p.54 / Chapter 5.2.2 --- Shared vs. Speaker-specific Thresholds --- p.55 / Chapter 5.3 --- Support Vector Machines --- p.56 / Chapter 5.4 --- Gaussian-based Classifier --- p.59 / Chapter 5.5 --- Simulation Results and Discussions --- p.60 / Chapter 5.5.1 --- Voting Method --- p.60 / Chapter 5.5.2 --- Support Vector Machines --- p.63 / Chapter 5.5.3 --- Gaussian-based Classifier --- p.64 / Chapter 5.5.4 --- Discussions --- p.66 / Chapter 5.6 --- Summary --- p.67 / Chapter 6 --- Conclusions and Suggested Future Works --- p.68 / Chapter 6.1 --- Conclusions --- p.68 / Chapter 6.2 --- Summary of Findings and Contributions of This Thesis --- p.70 / Chapter 6.3 --- Future Perspective --- p.71 / Chapter 6.3.1 --- Integration of Keyword Spotting into VIV --- p.71 / Chapter 6.3.2 --- Integration of Prosodic Information --- p.71 / Appendices --- p.73 / Chapter A --- A Cantonese VIV Demonstration System --- p.73 / Bibliography --- p.77
78

Automatic recognition of continuous Cantonese speech. / CUHK electronic theses & dissertations collection

January 1997 (has links)
Alfred Ying-Pang Ng. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (p. 159-169). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web.
79

Speaker adaptation in joint factor analysis based text independent speaker verification

Shou-Chun, Yin, 1980- January 2006 (has links)
No description available.
80

Speech recognition using hybrid system of neural networks and knowledge sources.

Darjazini, Hisham, University of Western Sydney, College of Health and Science, School of Engineering January 2006 (has links)
In this thesis, a novel hybrid Speech Recognition (SR) system called RUST (Recognition Using Syntactical Tree) is developed. RUST combines Artificial Neural Networks (ANN) with a Statistical Knowledge Source (SKS) for a small topic focused database. The hypothesis of this research work was that the inclusion of syntactic knowledge represented in the form of probability of occurrence of phones in words and sentences improves the performance of an ANN-based SR system. The lexicon of the first version of RUST (RUST-I) was developed with 1357 words of which 549 were unique. These words were extracted from three topics (finance, physics and general reading material), and could be expanded or reduced (specialised). The results of experiments carried out on RUST showed that by including basic statistical phonemic/syntactic knowledge with an ANN phone recognisor, the phone recognition rate was increased to 87% and word recognition rate to 78%. The first implementation of RUST was not optimal. Therefore, a second version of RUST (RUST-II) was implemented with an incremental learning algorithm and it has been shown to improve the phone recognition rate to 94%. The introduction of incremental learning to ANN-based speech recognition can be considered as the most innovative feature of this research. In conclusion this work has proved the hypothesis that inclusion of a phonemic syntactic knowledge of probabilistic nature and topic related statistical data using an adaptive phone recognisor based on neural networks has the potential to improve the performance of a speech recognition system. / Doctor of Philosophy (PhD)

Page generated in 0.1419 seconds