Global ETD Search

Return to search

Model Based Speech Enhancement and Coding

In mobile speech communication, adverse conditions, such as noisy acoustic environments and unreliable network connections, may severely degrade the intelligibility and natural- ness of the received speech quality, and increase the listening effort. This thesis focuses on countermeasures based on statistical signal processing techniques. The main body of the thesis consists of three research articles, targeting two specific problems: speech enhancement for noise reduction and flexible source coder design for unreliable networks. Papers A and B consider speech enhancement for noise reduction. New schemes based on an extension to the auto-regressive (AR) hidden Markov model (HMM) for speech and noise are proposed. Stochastic models for speech and noise gains (excitation variance from an AR model) are integrated into the HMM framework in order to improve the modeling of energy variation. The extended model is referred to as a stochastic-gain hidden Markov model (SG-HMM). The speech gain describes the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain improves the tracking of the time-varying energy of non-stationary noise, e.g., due to movement of the noise source. In Paper A, it is assumed that prior knowledge on the noise environment is available, so that a pre-trained noise model is used. In Paper B, the noise model is adaptive and the model parameters are estimated on-line from the noisy observations using a recursive estimation algorithm. Based on the speech and noise models, a novel Bayesian estimator of the clean speech is developed in Paper A, and an estimator of the noise power spectral density (PSD) in Paper B. It is demonstrated that the proposed schemes achieve more accurate models of speech and noise than traditional techniques, and as part of a speech enhancement system provide improved speech quality, particularly for non-stationary noise sources. In Paper C, a flexible entropy-constrained vector quantization scheme based on Gaus- sian mixture model (GMM), lattice quantization, and arithmetic coding is proposed. The method allows for changing the average rate in real-time, and facilitates adaptation to the currently available bandwidth of the network. A practical solution to the classical issue of indexing and entropy-coding the quantized code vectors is given. The proposed scheme has a computational complexity that is independent of rate, and quadratic with respect to vector dimension. Hence, the scheme can be applied to the quantization of source vectors in a high dimensional space. The theoretical performance of the scheme is analyzed under a high-rate assumption. It is shown that, at high rate, the scheme approaches the theoretically optimal performance, if the mixture components are located far apart. The practical performance of the scheme is confirmed through simulations on both synthetic and speech-derived source vectors. / QC 20100825

statistical model

Gaussian mixture mdel (GMM)

hidden Markov model (HMM)

moise reduction

Telecommunication

Telekommunikation

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-4412
Date	January 2007
Creators	Zhao, David Yuheng
Publisher	KTH, Ljud- och bildbehandling, Stockholm : KTH
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Doctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Trita-EE, 1653-5146 ; 2007:018

Page generated in 0.0079 seconds

Model Based Speech Enhancement and Coding

Description

Links & Downloads

Tags

Additional Fields