Return to search

Robustness in ASR : an experimental study of the interrelationship between discriminant feature-space transformation, speaker normalization and environment compensation

This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These interactions were quantified by performing experiments on the connected digit utterances comprising the Aurora 2 database, using continuous density hidden Markov models (HMM) representing individual digits. / Firstly, given that the performance of speaker normalization techniques degrades in the presence of noise, it is shown that reducing the effects of noise through environmental compensation, prior to speaker normalization, leads to substantial improvements in ASR performance. The speaker normalization techniques considered here were vocal tract length normalization (VTLN) and the augmented state-space acoustic decoder (MATE). Secondly, given that discriminant feature-space transformation (DFT) are known to increase class separation, it is shown that performing speaker normalization using VTLN in a discriminant feature-space leads to improvements in the performance of this technique. Classes, in our experiments, corresponded to HMM states. Thirdly, an effort was made to achieve higher class discrimination by normalizing the speech data used to estimate the discriminant feature-space transform. Normalization, in our experiments, corresponded to reducing the variability within each class through the use of environment compensation and speaker normalization. Significant ASR performance improvements were obtained when normalization was performed using environment compensation, while our results were inconclusive for the case where normalization consisted of speaker normalization. Finally, aimed at increasing its noise robustness, a simple modification of MATE is presented. This modification consisted of using, during recognition, knowledge of the distribution of warping factors selected by MATE during training.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.99772
Date January 2007
CreatorsKeyvani, Alireza.
PublisherMcGill University
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Formatapplication/pdf
CoverageMaster of Engineering (Department of Electrical and Computer Engineering..)
Rights© Alireza Keyvani, 2007
Relationalephsysno: 002614143, proquestno: AAIMR32600, Theses scanned by UMI/ProQuest.

Page generated in 0.0045 seconds