Global ETD Search

Return to search

A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition

A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907106-233318

Speaker recognition

Text independent

Vector quantization

Gaussian mixture model

Mel-frequency cepstrum coefficients

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0907106-233318
Date	07 September 2006
Creators	Wang, Long-Cheng
Contributors	Chih-Chien Chen, Chii-Maw Uang, Tsung Lee
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907106-233318
Rights	not_available, Copyright information available at source archive

Page generated in 0.0015 seconds

A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition

Description

Links & Downloads

Tags

Additional Fields