Return to search

Blind convolutive speech separation and dereverberation

Extraction of a target speech signal from the convolutive mixture of multiple sources observed in a cocktail party environment is a challenging task, especially when the room acoustic effects and background noise are present in the environment. Such acoustic distortions may further degrade the separation performance of many existing source separation algorithms. Algorithmic solutions to this problem are likely to have strong impact on many applications including automatic speech recognition, hearing aids and cochlear implants, and human-machine interaction. In such applications, to extract the target speech, it is usually required to deal with not only the interfering sound, but also the room reverberations and background noise. To address this problem, several methods are developed in this thesis. For the blind separation of a target speech signal from the convolutive mixture, a multistage algorithm is proposed in which a convolutive independent component analysis (leA) algorithm is applied to the mixture, followed by the estimation of an ideal binary mask (IBM) from the separated sources obtained with the convolutive leA algorithm. In the last step, the errors introduced due to estimation of the IBM are reduced by cepstral smoothing. The separation performance of the above algorithm, however, deteriorates with the increase in surface reflections and background noise within the room environment. Two different methods are therefore developed to reduce such effects. In the first method which is also a multistage method, acoustic effects and background' noise are treated together using an empirical-mode-decomposition (EMD) based algorithm. The noisy reverberant speech is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs) via an EMD algorithm. Denoising is then applied to selected high frequency IMFs using an EMD- based minimum mean squared error (MMSE) filter, followed by spectral subtraction of the resulting denoised high and low-frequency IMFs. The second method is a two-stage dereverberation algorithm in which the smoothed spectral subtraction mask based on a frequency dependent model is derived and then applied to the reverberant speech to reduce the effects of late reverberations. Wiener filtering is then applied such that the early reverberations are attenuated. Finally, an algorithm is developed for joint blind separation and blind dereverberation. The proposed method consists of a step for the blind estimation of reverberation time (RT). The method is employed in three different ways. Firstly, the available mixture signals are used to estimate blindly the RT, followed by the dereverberation of the mixture signals. Then, the separation algorithm is applied to these resultant mixtures. Secondly, the separation algorithm is applied first to the mixtures, followed by the blind dereverberation of the segregated speech signals. In the third scheme, the separation algorithm is split such that the convolutive leA is first applied to the mixtures, followed by the blind dereverberation of the signals obtained from convolutive leA. Then, the T-F representation of the dereverberated signals is used to estimate the IBM followed by cepstral smoothing.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:551136
Date January 2012
CreatorsJan, Tariqullah
PublisherUniversity of Surrey
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0025 seconds