Identity verification or biometric recognition systems play an important role in our daily lives. Applications include Automatic Teller Machines (ATM), banking and share information retrieval, and personal verification for credit cards. Among the biometric techniques, authentication of speakers by his/her voice is of great importance, since it employs a non-invasive approach and is the only available modality in many applications. However,the performance of Automatic Speaker Verification (ASV) systems degrades significantly under adverse conditions which cause recordings from the same speaker to be different.The objective of this research is to investigate and develop robust techniques for performing automatic speaker recognition over various channel conditions, such as telephony and recorded microphone speech. This research is shown to improve the robustness of ASV systems in three main areas of feature extraction, speaker modelling and score normalization. At the feature level, a new set of dynamic features, termed Delta Cepstral Energy (DCE) is proposed, instead of traditional delta cepstra, which not only greatly reduces thedimensionality of the feature vector compared with delta and delta-delta cepstra, but is also shown to provide the same performance for matched testing and training conditions on TIMIT and a subset of the NIST 2002 dataset. The concept of speaker entropy, which conveys the information contained in a speaker's speech based on the extracted features, facilitates comparative evaluation of the proposed methods. In addition, Frequency Modulation features are combined in a complementary manner with the Mel Frequency CepstralCoefficients (MFCCs) to improve the performance of the ASV system under channel variability of various types. The proposed fused system shows a relative reduction of up to 23% in Equal Error Rate (EER) over the MFCC-based system when evaluated on the NIST 2008 dataset. Currently, the main challenge in speaker modelling is channel variability across different sessions. A recent approach to channel compensation, based on Support Vector Machines (SVM) is Nuisance Attribute Projection (NAP). The proposed multi-component approach to NAP, attempts to compensate for the main sources of inter-session variations through an additional optimization criteria, to allow more accurate estimates of the most dominant channel artefacts and to improve the system performance under mismatched training and test conditions. Another major issue in speaker recognition is that the variability of score distributions due to incompletely modelled regions of the feature space can produce segments of the test speech that are poorly matched to the claimed speaker model. A segment selection technique in score normalization is proposed that relies only on discriminative and reliable segments of the test utterance to verify the speaker. This approach is particularly useful in noisy conditions where using speech activity detection is not reliable at the feature level. Another source of score variability comes from the fact that not all phonemes are equally discriminative. To address this, a new score re-weighting technique is applied to likelihood values based on the discriminative level of each Gaussian component, i.e. each particular region of the feature space. It is found that a limited number of Gaussian mixtures, herein termed discriminative components are responsible for the overall performance, and that inclusion of the other non-discriminative components may only degrade the system performance.
Identifer | oai:union.ndltd.org:ADTP/258040 |
Date | January 2008 |
Creators | Nosratighods, Mohaddeseh, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW |
Publisher | Publisher:University of New South Wales. Electrical Engineering & Telecommunications |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright |
Page generated in 0.1654 seconds