Global ETD Search

1	Auditory Front-Ends for Noise-Robust Automatic Speech Recognition Yeh, Ja-Zang 25 August 2010 (has links) The human auditory perception system is much more noise-robust than any state-of the art automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature can be improved by employing the human auditory based feature extraction procedure. In this thesis, we investigate modifying the commonly-used feature extraction process for automatic speech recognition systems. A novel frequency masking curve, which is based on modeling the basilar membrane as a cascade system of damped simple harmonic oscillators, is used to replace the critical-band masking curve to compute the masking threshold. We mathematically analyze the coupled motion of the oscillator system (basilar membrane) when they are driven by short-time stationary (speech) signals. Based on the analysis, we derive the relation between the amplitudes of neighboring oscillators, and accordingly insert a masking module in the front-end signal processing stage to modify the speech spectrum. We evaluate the proposed method on the Aurora 2.0 noisy-digit speech database. When combined with the commonly-used cepstral mean subtraction post-processing, the proposed auditory front-end module achieves a significant improvement. The method of correlational masking effect curve combine with CMS can achieves relative improvements of 25.9% over the baseline respectively. After applying the methods iteratively, the relative improvement improves from 25.9% to 30.3%. frequency masking front end processing feature extraction noise-robust speech recognition

Search results

Auditory Front-Ends for Noise-Robust Automatic Speech Recognition