Spelling suggestions: "subject:"timefrequency masking"" "subject:"time.frequency masking""
1 |
Informed algorithms for sound source separation in enclosed reverberant environmentsKhan, Muhammad Salman January 2013 (has links)
While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses.
|
2 |
Time-Frequency Masking Performance for Improved Intelligibility with Microphone ArraysMorgan, Joshua P. 01 January 2017 (has links)
Time-Frequency (TF) masking is an audio processing technique useful for isolating an audio source from interfering sources. TF masking has been applied and studied in monaural and binaural applications, but has only recently been applied to distributed microphone arrays. This work focuses on evaluating the TF masking technique's ability to isolate human speech and improve speech intelligibility in an immersive "cocktail party" environment. In particular, an upper-bound on TF masking performance is established and compared to the traditional delay-sum and general sidelobe canceler (GSC) beamformers. Additionally, the novel technique of combining the GSC with TF masking is investigated and its performance evaluated. This work presents a resource-efficient method for studying the performance of these isolation techniques and evaluates their performance using both virtually simulated data and data recorded in a real-life acoustical environment. Further, methods are presented to analyze speech intelligibility post-processing, and automated objective intelligibility measurements are applied alongside informal subjective assessments to evaluate the performance of these processing techniques. Finally, the causes for subjective/objective intelligibility measurement disagreements are discussed, and it was shown that TF masking did enhance intelligibility beyond delay-sum beamforming and that the utilization of adaptive beamforming can be beneficial.
|
3 |
Deep learning methods for reverberant and noisy speech enhancementZhao, Yan 15 September 2020 (has links)
No description available.
|
4 |
Supervised Speech Separation Using Deep Neural NetworksWang, Yuxuan 21 May 2015 (has links)
No description available.
|
5 |
Nedourčená slepá separace zvukových signálů / Underdetermined Blind Audio Signal SeparationČermák, Jan January 2008 (has links)
We often have to face the fact that several signals are mixed together in unknown environment. The signals must be first extracted from the mixture in order to interpret them correctly. This problem is in signal processing society called blind source separation. This dissertation thesis deals with multi-channel separation of audio signals in real environment, when the source signals outnumber the sensors. An introduction to blind source separation is presented in the first part of the thesis. The present state of separation methods is then analyzed. Based on this knowledge, the separation systems implementing fuzzy time-frequency mask are introduced. However these methods are still introducing nonlinear changes in the signal spectra, which can yield in musical noise. In order to reduce musical noise, novel methods combining time-frequency binary masking and beamforming are introduced. The new separation system performs linear spatial filtering even if the source signals outnumber the sensors. Finally, the separation systems are evaluated by objective and subjective tests in the last part of the thesis.
|
6 |
Improving Speech Intelligibility Without Sacrificing Environmental Sound RecognitionJohnson, Eric Martin 27 September 2022 (has links)
No description available.
|
Page generated in 0.0579 seconds