1 |
Auditory Front-Ends for Noise-Robust Automatic Speech RecognitionYeh, Ja-Zang 25 August 2010 (has links)
The human auditory perception system is much more noise-robust than any state-of the art automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature can be improved by employing the human auditory based
feature extraction procedure.
In this thesis, we investigate modifying the commonly-used feature extraction process for automatic speech recognition systems. A novel frequency masking curve, which is based on modeling the basilar
membrane as a cascade system of damped simple harmonic oscillators, is used to replace the critical-band masking curve to compute the masking
threshold. We mathematically analyze the coupled motion of the oscillator system (basilar membrane) when they are driven by short-time stationary (speech) signals. Based on the analysis, we derive the relation between the amplitudes of neighboring oscillators,
and accordingly insert a masking module in the front-end signal processing stage to modify the speech spectrum.
We evaluate the proposed method on the Aurora 2.0
noisy-digit speech database. When combined with the commonly-used cepstral mean subtraction post-processing, the proposed auditory front-end module achieves a significant improvement. The method
of correlational masking effect curve combine with CMS can achieves relative improvements of 25.9%
over the baseline respectively. After applying the methods iteratively, the relative improvement
improves from 25.9% to 30.3%.
|
Page generated in 0.1077 seconds