Return to search

A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition

A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0907106-233318
Date07 September 2006
CreatorsWang, Long-Cheng
ContributorsChih-Chien Chen, Chii-Maw Uang, Tsung Lee
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907106-233318
Rightsnot_available, Copyright information available at source archive

Page generated in 0.0017 seconds