A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0907106-233318 |
Date | 07 September 2006 |
Creators | Wang, Long-Cheng |
Contributors | Chih-Chien Chen, Chii-Maw Uang, Tsung Lee |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907106-233318 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0017 seconds