31 |
Subband spectral features for speaker recognition.January 2004 (has links)
Tam Yuk Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1. --- Biometrics for User Authentication --- p.2 / Chapter 1.2. --- Voice-based User Authentication --- p.6 / Chapter 1.3. --- Motivation and Focus of This Work --- p.7 / Chapter 1.4. --- Thesis Outline --- p.9 / References --- p.11 / Chapter Chapter 2 --- Fundamentals of Automatic Speaker Recognition --- p.14 / Chapter 2.1. --- Speech Production --- p.14 / Chapter 2.2. --- Features of Speaker's Voice in Speech Signal --- p.16 / Chapter 2.3. --- Basics of Speaker Recognition --- p.19 / Chapter 2.4. --- Existing Approaches of Speaker Recognition --- p.20 / Chapter 2.4.1. --- Feature Extraction --- p.21 / Chapter 2.4.1.1 --- Overview --- p.21 / Chapter 2.4.1.2 --- Mel-Frequency Cepstral Coefficient (MFCC) --- p.21 / Chapter 2.4.2. --- Speaker Modeling --- p.24 / Chapter 2.4.2.1 --- Overview --- p.24 / Chapter 2.4.2.2 --- Gaussian Mixture Model (GMM) --- p.25 / Chapter 2.4.3. --- Speaker Identification (SID) --- p.26 / References --- p.29 / Chapter Chapter 3 --- Data Collection and Baseline System --- p.32 / Chapter 3.1. --- Data Collection --- p.32 / Chapter 3.2. --- Baseline System --- p.36 / Chapter 3.2.1. --- Experimental Set-up --- p.36 / Chapter 3.2.2. --- Results and Analysis --- p.39 / References --- p.42 / Chapter Chapter 4 --- Subband Spectral Envelope Features --- p.44 / Chapter 4.1. --- Spectral Envelope Features --- p.44 / Chapter 4.2. --- Subband Spectral Envelope Features --- p.46 / Chapter 4.3. --- Feature Extraction Procedures --- p.52 / Chapter 4.4. --- SID Experiments --- p.55 / Chapter 4.4.1. --- Experimental Set-up --- p.55 / Chapter 4.4.2. --- Results and Analysis --- p.55 / References --- p.62 / Chapter Chapter 5 --- Fusion of Subband Features --- p.63 / Chapter 5.1. --- Model Level Fusion --- p.63 / Chapter 5.1.1. --- Experimental Set-up --- p.63 / Chapter 5.1.2. --- "Results and Analysis," --- p.65 / Chapter 5.2. --- Feature Level Fusion --- p.69 / Chapter 5.2.1. --- Experimental Set-up --- p.70 / Chapter 5.2.2. --- "Results and Analysis," --- p.71 / Chapter 5.3. --- Discussion --- p.73 / References --- p.75 / Chapter Chapter 6 --- Utterance-Level SID with Text-Dependent Weights --- p.77 / Chapter 6.1. --- Motivation --- p.77 / Chapter 6.2. --- Utterance-Level SID --- p.78 / Chapter 6.3. --- Baseline System --- p.79 / Chapter 6.3.1. --- Implementation Details --- p.79 / Chapter 6.3.2. --- "Results and Analysis," --- p.80 / Chapter 6.4. --- Text-Dependent Weights --- p.81 / Chapter 6.4.1. --- Implementation Details --- p.81 / Chapter 6.4.2. --- "Results and Analysis," --- p.83 / Chapter 6.5. --- Text-Dependent Feature Weights --- p.86 / Chapter 6.5.1. --- Implementation Details --- p.86 / Chapter 6.5.2. --- "Results and Analysis," --- p.87 / Chapter 6.6. --- Text-Dependent Weights Applied in Score Combination and Subband Features --- p.88 / Chapter 6.6.1. --- Implementation Details --- p.89 / Chapter 6.6.2. --- Results and Analysis --- p.89 / Chapter 6.7. --- Discussion --- p.90 / Chapter Chapter 7 --- Conclusions and Suggested Future Work --- p.92 / Chapter 7.1. --- Conclusions --- p.92 / Chapter 7.2. --- Suggested Future Work --- p.94 / Appendix --- p.96 / Appendix 1 Speech Content for Data Collection --- p.96
|
32 |
On the robustness of static and dynamic spectral information for speech recognition in noise. / CUHK electronic theses & dissertations collectionJanuary 2004 (has links)
Yang Chen. / "November 2004." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (p. 131-141) / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
|
33 |
Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances.January 2007 (has links)
Wang, Ning. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 106-115). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Introduction to Speech and Speaker Recognition --- p.1 / Chapter 1.2 --- Difficulties and Challenges of Speaker Authentication --- p.6 / Chapter 1.3 --- Objectives and Thesis Outline --- p.7 / Chapter 2 --- Speaker Recognition System --- p.10 / Chapter 2.1 --- Baseline Speaker Recognition System Overview --- p.10 / Chapter 2.1.1 --- Feature Extraction --- p.12 / Chapter 2.1.2 --- Pattern Generation and Classification --- p.24 / Chapter 2.2 --- Performance Evaluation Metric for Different Speaker Recognition Tasks --- p.30 / Chapter 2.3 --- Robustness of Speaker Recognition System --- p.30 / Chapter 2.3.1 --- Speech Corpus: CU2C --- p.30 / Chapter 2.3.2 --- Noise Database: NOISEX-92 --- p.34 / Chapter 2.3.3 --- Mismatched Training and Testing Conditions --- p.35 / Chapter 2.4 --- Summary --- p.37 / Chapter 3 --- Speaker Recognition System using both Vocal Tract and Vocal Source Features --- p.38 / Chapter 3.1 --- Speech Production Mechanism --- p.39 / Chapter 3.1.1 --- Speech Production: An Overview --- p.39 / Chapter 3.1.2 --- Acoustic Properties of Human Speech --- p.40 / Chapter 3.2 --- Source-filter Model and Linear Predictive Analysis --- p.44 / Chapter 3.2.1 --- Source-filter Speech Model --- p.44 / Chapter 3.2.2 --- Linear Predictive Analysis for Speech Signal --- p.46 / Chapter 3.3 --- Vocal Tract Features --- p.51 / Chapter 3.4 --- Vocal Source Features --- p.52 / Chapter 3.4.1 --- Source Related Features: An Overview --- p.52 / Chapter 3.4.2 --- Source Related Features: Technical Viewpoints --- p.54 / Chapter 3.5 --- Effects of Noises on Speech Properties --- p.55 / Chapter 3.6 --- Summary --- p.61 / Chapter 4 --- Estimation of Robust Acoustic Features for Speaker Discrimination --- p.62 / Chapter 4.1 --- Robust Speech Techniques --- p.63 / Chapter 4.1.1 --- Noise Resilience --- p.64 / Chapter 4.1.2 --- Speech Enhancement --- p.64 / Chapter 4.2 --- Spectral Subtractive-Type Preprocessing --- p.65 / Chapter 4.2.1 --- Noise Estimation --- p.66 / Chapter 4.2.2 --- Spectral Subtraction Algorithm --- p.66 / Chapter 4.3 --- LP Analysis of Noisy Speech --- p.67 / Chapter 4.3.1 --- LP Inverse Filtering: Whitening Process --- p.68 / Chapter 4.3.2 --- Magnitude Response of All-pole Filter in Noisy Condition --- p.70 / Chapter 4.3.3 --- Noise Spectral Reshaping --- p.72 / Chapter 4.4 --- Distinctive Vocal Tract and Vocal Source Feature Extraction . . --- p.73 / Chapter 4.4.1 --- Vocal Tract Feature Extraction --- p.73 / Chapter 4.4.2 --- Source Feature Generation Procedure --- p.75 / Chapter 4.4.3 --- Subband-specific Parameterization Method --- p.79 / Chapter 4.5 --- Summary --- p.87 / Chapter 5 --- Speaker Recognition Tasks & Performance Evaluation --- p.88 / Chapter 5.1 --- Speaker Recognition Experimental Setup --- p.89 / Chapter 5.1.1 --- Task Description --- p.89 / Chapter 5.1.2 --- Baseline Experiments --- p.90 / Chapter 5.1.3 --- Identification and Verification Results --- p.91 / Chapter 5.2 --- Speaker Recognition using Source-tract Features --- p.92 / Chapter 5.2.1 --- Source Feature Selection --- p.92 / Chapter 5.2.2 --- Source-tract Feature Fusion --- p.94 / Chapter 5.2.3 --- Identification and Verification Results --- p.95 / Chapter 5.3 --- Performance Analysis --- p.98 / Chapter 6 --- Conclusion --- p.102 / Chapter 6.1 --- Discussion and Conclusion --- p.102 / Chapter 6.2 --- Suggestion of Future Work --- p.104
|
34 |
Integrating computational auditory scene analysis and automatic speech recognitionSrinivasan, Soundararajan, January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 173-186).
|
35 |
Design of Detectors for Automatic Speech RecognitionMartínez del Hoyo Canterla, Alfonso January 2012 (has links)
This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and the models are trained for the specific detection problem. Our experiments in the TIMIT database validate the effectiveness of this structure for detection of phones and articulatory features. Secondly, two discriminative training techniques are proposed for detector training. The first one is a modification of Minimum Classification Error training. The second one, Minimum Detection Error training, is the adaptation of Minimum Phone Error to the detection problem. Both methods are used to train HMMs and filterbanks in the detectors, isolated or jointly. MDE has the advantage that any detection performance criterion can be optimized directly. F-score and class accuracy optimization experiments show that MDE training is superior to the MCE-based method. The optimized filterbanks reflect some acoustical properties of the detection classes. Moreover, some changes are consistent over classes with similar acoustical properties. In addition, MDE-training of filterbanks results in filters significatively different than in the standard filterbank. In fact, some filters extract information from different critical bands. Finally, we propose a detection-based automatic speech recognition system. Detectors are built with the proposed HMM-based detection structure and trained discriminatively. The linguistic merger is based on an MLP/Viterbi decoder.
|
36 |
Residual-excited linear predictive (RELP) vocoder system with TMS320C6711 DSK and vowel characterizationTaguchi, Akihiro 09 January 2004
The area of speech recognition by machine is one of the most popular and complicated subjects in the current multimedia field. Linear predictive coding (LPC) is a useful technique for voice coding in speech analysis and synthesis. The first objective of this research was to establish a prototype of the residual-excited linear predictive (RELP) vocoder system in a real-time environment. Although its transmission rate is higher, the quality of synthesized speech of the RELP vocoder is superior to that of other vocoders. As well, it is rather simple and robust to implement. The RELP vocoder uses residual signals as excitation rather than periodic pulse or white noise. The RELP vocoder was implemented with Texas Instruments TMS320C6711 DSP starter kit (DSK) using C.
Identifying vowel sounds is an important element in recognizing speech contents. The second objective of research was to explore a method of characterizing vowels by means of parameters extracted by the RELP vocoder, which was not known to have been used in speech recognition, previously. Five English vowels were chosen for the experimental sample. Utterances of individual vowel sounds and of the vowel sounds in one-syllable-words were recorded and saved as WAVE files. A large sample of 20-ms vowel segments was obtained from these utterances. The presented method utilized 20 samples of a segment's frequency response, taken equally in logarithmic scale, as a LPC frequency response vector. The average of each vowel's vectors was calculated. The Euclidian distances between the average vectors of the five vowels and an unknown vector were compared to classify the unknown vector into a certain vowel group.
The results indicate that, when a vowel is uttered alone, the distance to its average vector is smaller than to the other vowels' average vectors. By examining a given vowel frequency response against all known vowels' average vectors, individually, one can determine to which vowel group the given vowel belongs. When a vowel is uttered with consonants, however, variances and covariances increase. In some cases, distinct differences may not be recognized among the distances to a vowel's own average vector and the distances to the other vowels' average vectors. Overall, the results of vowel characterization did indicate an ability of the RELP vocoder to identify and classify single vowel sounds.
|
37 |
Residual-excited linear predictive (RELP) vocoder system with TMS320C6711 DSK and vowel characterizationTaguchi, Akihiro 09 January 2004 (has links)
The area of speech recognition by machine is one of the most popular and complicated subjects in the current multimedia field. Linear predictive coding (LPC) is a useful technique for voice coding in speech analysis and synthesis. The first objective of this research was to establish a prototype of the residual-excited linear predictive (RELP) vocoder system in a real-time environment. Although its transmission rate is higher, the quality of synthesized speech of the RELP vocoder is superior to that of other vocoders. As well, it is rather simple and robust to implement. The RELP vocoder uses residual signals as excitation rather than periodic pulse or white noise. The RELP vocoder was implemented with Texas Instruments TMS320C6711 DSP starter kit (DSK) using C.
Identifying vowel sounds is an important element in recognizing speech contents. The second objective of research was to explore a method of characterizing vowels by means of parameters extracted by the RELP vocoder, which was not known to have been used in speech recognition, previously. Five English vowels were chosen for the experimental sample. Utterances of individual vowel sounds and of the vowel sounds in one-syllable-words were recorded and saved as WAVE files. A large sample of 20-ms vowel segments was obtained from these utterances. The presented method utilized 20 samples of a segment's frequency response, taken equally in logarithmic scale, as a LPC frequency response vector. The average of each vowel's vectors was calculated. The Euclidian distances between the average vectors of the five vowels and an unknown vector were compared to classify the unknown vector into a certain vowel group.
The results indicate that, when a vowel is uttered alone, the distance to its average vector is smaller than to the other vowels' average vectors. By examining a given vowel frequency response against all known vowels' average vectors, individually, one can determine to which vowel group the given vowel belongs. When a vowel is uttered with consonants, however, variances and covariances increase. In some cases, distinct differences may not be recognized among the distances to a vowel's own average vector and the distances to the other vowels' average vectors. Overall, the results of vowel characterization did indicate an ability of the RELP vocoder to identify and classify single vowel sounds.
|
38 |
Speaker dynamics as a source of pronunciation variability for continuous speech recognition models /Bates, Rebecca Anne. January 2004 (has links)
Thesis (Ph. D.)--University of Washington, 2004. / Vita. Includes bibliographical references (leaves 139-151).
|
39 |
Kernel eigenvoice speaker adaptation /Ho, Ka-Lung. January 2003 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 56-61). Also available in electronic version. Access restricted to campus users.
|
40 |
Strategies for improving audible quality and speech recognition accuracy of reverberant speech /Gillespie, Bradford W. January 2002 (has links)
Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 103-108).
|
Page generated in 0.0826 seconds