Return to search

A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge

Noise reduction aims to improve the quality of noisy speech by suppressing the background noise in the signal. However, there is always a tradeoff between noise reduction and signal distortion--more noise reduction is always accompanied by more signal distortion. An evaluation of the intelligibility of speech processed by several noise reduction algorithms in [23] showed that most noise reduction algorithms were not successful in improving the intelligibility of noisy speech.

In this thesis, we aim to utilize acoustic-phonetic knowledge to enhance the intelligibility of noise-reduced speech. Acoustic-phonetics studies the characteristics of speech and the acoustic cues that are important for speech intelligibility. We considered the following questions: what is the noise reduction algorithm that we should use, what are the acoustic cues that should be targeted, and how to incorporate this information into the design of the noise reduction system.

A Bayesian noise reduction method similar to the one proposed by Ephraim and Malah in [16] is employed. We first evaluate the goodness-of-fit of several parametric PDF models to the empirical speech data. For classified speech, we find that the Rayleigh and Gamma. with a fixed shape parameter of 5, model the speech spectral amplitude equally well. The Gamma-MAP and Gamma-MMSE estimators are derived. The subjective and objective performances of these estimators are then compared.

We also propose to apply a class-based cue-enhancement, similar to those performed in [21]. The processing directly manipulates the acoustic cues known to be important for speech intelligibility. We assume that the system has the sound class information of the input speech. The scheme aims to enhance the interclass and intraclass distinction of speech sounds. The intelligibility of speech processed by the proposed system is then compared to the intelligibility of speech processed by the Rayleigh-MMSE estimator [16]

The intelligibility evaluation shows that the proposed scheme enhances the detection of plosive and fricative sounds. However, it does not help in the intraclass discrimination of plosive sounds, and more tests need to be done to evaluate whether intraclass discrimination of fricatives is improved. The proposed scheme deteriorates the detection of nasal and affricate sounds. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-08-24 21:32:48.966

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OKQ.1974/5085
Date25 August 2009
CreatorsSudirga, RENITA
ContributorsQueen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Format1611528 bytes, application/pdf
RightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
RelationCanadian theses

Page generated in 0.0022 seconds