Return to search

Auditory Front-Ends for Noise-Robust Automatic Speech Recognition

The human auditory perception system is much more noise-robust than any state-of the art automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature can be improved by employing the human auditory based
feature extraction procedure.
In this thesis, we investigate modifying the commonly-used feature extraction process for automatic speech recognition systems. A novel frequency masking curve, which is based on modeling the basilar
membrane as a cascade system of damped simple harmonic oscillators, is used to replace the critical-band masking curve to compute the masking
threshold. We mathematically analyze the coupled motion of the oscillator system (basilar membrane) when they are driven by short-time stationary (speech) signals. Based on the analysis, we derive the relation between the amplitudes of neighboring oscillators,
and accordingly insert a masking module in the front-end signal processing stage to modify the speech spectrum.
We evaluate the proposed method on the Aurora 2.0
noisy-digit speech database. When combined with the commonly-used cepstral mean subtraction post-processing, the proposed auditory front-end module achieves a significant improvement. The method
of correlational masking effect curve combine with CMS can achieves relative improvements of 25.9%
over the baseline respectively. After applying the methods iteratively, the relative improvement
improves from 25.9% to 30.3%.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0825110-171640
Date25 August 2010
CreatorsYeh, Ja-Zang
ContributorsChung-Hsien Wu, Chia-Ping Chen, Hsin-Min Wang, Jui-Feng Yeh
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0825110-171640
Rightsnot_available, Copyright information available at source archive

Page generated in 0.0022 seconds