The human auditory perception system is much more noise-robust than any state-of the art automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature can be improved by employing the human auditory based
feature extraction procedure.
In this thesis, we investigate modifying the commonly-used feature extraction process for automatic speech recognition systems. A novel frequency masking curve, which is based on modeling the basilar
membrane as a cascade system of damped simple harmonic oscillators, is used to replace the critical-band masking curve to compute the masking
threshold. We mathematically analyze the coupled motion of the oscillator system (basilar membrane) when they are driven by short-time stationary (speech) signals. Based on the analysis, we derive the relation between the amplitudes of neighboring oscillators,
and accordingly insert a masking module in the front-end signal processing stage to modify the speech spectrum.
We evaluate the proposed method on the Aurora 2.0
noisy-digit speech database. When combined with the commonly-used cepstral mean subtraction post-processing, the proposed auditory front-end module achieves a significant improvement. The method
of correlational masking effect curve combine with CMS can achieves relative improvements of 25.9%
over the baseline respectively. After applying the methods iteratively, the relative improvement
improves from 25.9% to 30.3%.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0825110-171640 |
Date | 25 August 2010 |
Creators | Yeh, Ja-Zang |
Contributors | Chung-Hsien Wu, Chia-Ping Chen, Hsin-Min Wang, Jui-Feng Yeh |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0825110-171640 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.002 seconds