Speech recognition systems have improved in robustness in recent years with respect to both speaker and acoustical variability. Nevertheless, it is still a challenge to deploy speech recognition systems in real-world applications that are exposed to diverse and significant level of noise. Robustness and recognition accuracy are the essential criteria in determining the extent of a speech recognition system deployed in real-world applications. This work involves development of techniques and extensions to extract robust features from speech and achieve substantial performance in speech recognition. Robustness and recognition accuracy are the top concern in this research. In this work, the robustness issue is approached using the front-end processing, in particular robust feature extraction. The author proposes an unified framework for robust feature and presents a comprehensive evaluation on robustness in speech features. The framework addresses three distinct approaches: robust feature extraction, temporal information inclusion and normalization strategies. The author discusses the issue of robust feature selection primarily in the spectral and cepstral context. Several enhancement and extensions are explored for the purpose of robustness. This includes a computationally efficient approach proposed for moment normalization. In addition, a simple back-end approach is incorporated to improve recognition performance in reverberant environments. Speech features in this work are evaluated in three distinct environments that occur in real-world scenarios. The thesis also discusses the effect of noise on speech features and their parameters. The author has established that statistical properties play an important role in mismatches. The significance of the research is strengthened by the evaluation of robust approaches in more than one scenario and the comparison with the performance of the state-of-the-art features. The contributions and limitations of each robust feature in all three different environments are highlighted. The novelty of the work lies in the diverse hostile environments which speech features are evaluated for robustness. The author has obtained recognition accuracy of more than 98.5% for channel distortion. Recognition accuracy greater than 90.0% has also been maintained for reverberation time 0.4s and additive babble noise at SNR 10dB. The thesis delivers a comprehensive research on robust speech features for speech recognition in hostile environments supported by significant experimental results. Several observations, recommendations and relevant issues associated with robust speech features are presented.
Identifer | oai:union.ndltd.org:ADTP/194826 |
Date | January 1900 |
Creators | Toh, Aik |
Publisher | University of Western Australia. |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright Aik Toh, http://www.itpo.uwa.edu.au/UWA-Computer-And-Software-Use-Regulations.html |
Page generated in 0.0019 seconds