Global ETD Search

Return to search

Auditory Front-Ends for Noise-Robust Automatic Speech Recognition

The human auditory perception system is much more noise-robust than any state-of the art automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature can be improved by employing the human auditory based
feature extraction procedure.
In this thesis, we investigate modifying the commonly-used feature extraction process for automatic speech recognition systems. A novel frequency masking curve, which is based on modeling the basilar
membrane as a cascade system of damped simple harmonic oscillators, is used to replace the critical-band masking curve to compute the masking
threshold. We mathematically analyze the coupled motion of the oscillator system (basilar membrane) when they are driven by short-time stationary (speech) signals. Based on the analysis, we derive the relation between the amplitudes of neighboring oscillators,
and accordingly insert a masking module in the front-end signal processing stage to modify the speech spectrum.
We evaluate the proposed method on the Aurora 2.0
noisy-digit speech database. When combined with the commonly-used cepstral mean subtraction post-processing, the proposed auditory front-end module achieves a significant improvement. The method
of correlational masking effect curve combine with CMS can achieves relative improvements of 25.9%
over the baseline respectively. After applying the methods iteratively, the relative improvement
improves from 25.9% to 30.3%.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0825110-171640

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0825110-171640
Date	25 August 2010
Creators	Yeh, Ja-Zang
Contributors	Chung-Hsien Wu, Chia-Ping Chen, Hsin-Min Wang, Jui-Feng Yeh
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0825110-171640
Rights	not_available, Copyright information available at source archive

Page generated in 0.0025 seconds

Auditory Front-Ends for Noise-Robust Automatic Speech Recognition

Description

Links & Downloads

Tags

Additional Fields