Return to search

Spectral refinement to speech enhancement

The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency amplitude of clean speech is estimated per short-time frame of the noisy signal. The state of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human. This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement. We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multitaper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms. An innovative method of incorporating a speech production model using multiband excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input. / The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement. / by Werayuth Charoenruengkit. / Vita. / Thesis (Ph.D.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.

Identiferoai:union.ndltd.org:fau.edu/oai:fau.digital.flvc.org:fau_2875
ContributorsCharoenruengkit, Werayuth., College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
PublisherFlorida Atlantic University
Source SetsFlorida Atlantic University
LanguageEnglish
Detected LanguageEnglish
TypeText, Electronic Thesis or Dissertation
Formatxiv, 124 p. : ill. (some col.)., electronic
Rightshttp://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0021 seconds