Global ETD Search

Return to search

Channel Compensation for Speaker Recognition Systems

This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: Convolutional (or lowpass filtering) effects Addition of white Gaussian noise Three different methods of channel compensation were tested: Cepstral Mean Subtraction (CMS) RelAtive SpecTrAl (RASTA) Processing Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).

http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453

RelAtive SpecTral processing

RASTA

Cepstral Mean Subtraction

Constant Modulus Algorithm

speech filtering

Mel-Frequency Cepstral coefficients

Identifer	oai:union.ndltd.org:ADTP/210321
Date	January 2007
Creators	Neville, Katrina Lee, katrina.neville@rmit.edu.au
Publisher	RMIT University. Electrical and Computer Engineering
Source Sets	Australiasian Digital Theses Program
Language	English
Detected Language	English
Rights	http://www.rmit.edu.au/help/disclaimer, Copyright Katrina Lee Neville

Page generated in 0.0019 seconds

Channel Compensation for Speaker Recognition Systems

Description

Links & Downloads

Tags

Additional Fields