Spelling suggestions: "subject:"cocktailparty problem"" "subject:"cocktailpartyt problem""
1 |
The Cocktail Party Problem: Solutions and Applications.Wiklund, Karl 02 1900 (has links)
<p>The human auditory system is remarkable in its ability to function in busy acoustic environments. It is able to selectively focus attention on and extract a single source of interest in the midst of competing acoustic sources, reverberation and motion. Yet this problem, which is so elementary for most human listeners has proven to be a very difficult one to solve computationally. Even more difficult has been the search for practical solutions to problems to which digital signal processing can be applied. Many applications that would benefit from a solution such as hearing aid systems, industrial noise control, or audio surveillance require that any such solution be able to operate in real time and consume only a minimal amount of computational resources.</p> <p>In this thesis, a novel solution to the cocktail party problem is proposed. This solution is rooted in the field of Computational Auditory Scene Analysis, and makes use of insights regarding the processing carried out by the early human auditory system in order to effectively suppress interference. These neurobiological insights have been thus adapted in such a way as to produce a solution to the cocktail party problem that is practical from an engineering point of view. The proposed solution has been found to be robust under a wide range of realistic environmental conditions, including spatially distributed interference, as well as reverberation.</p> / Thesis / Doctor of Philosophy (PhD)
|
2 |
A Deep Learning Approach to Brain Tracking of SoundHermansson, Oscar January 2022 (has links)
Objectives: Development of accurate auditory attention decoding (AAD) algorithms, capable of identifying the attended sound source from the speech evoked electroencephalography (EEG) responses, could lead to new solutions for hearing impaired listeners: neuro-steered hearing aids. Many of the existing AAD algorithms are either inaccurate or very slow. Therefore, there is a need to develop new EEG-based AAD methods. The first objective of this project was to investigate deep neural network (DNN) models for AAD and compare them to the state-of-the-art linear models. The second objective was to investigate whether generative adversarial networks (GANs) could be used for speech-evoked EEGdata augmentation to improve the AAD performance. Design: The proposed methods were tested in a dataset of 34 participants who performed an auditory attention task. They were instructed to attend to one of the two talkers in the front and ignore the talker on the other side and back-ground noise behind them, while high density EEG was recorded. Main Results: The linear models had an average attended vs ignored speech classification accuracy of 95.87% and 50% for ∼30 second and 8 seconds long time windows, respectively. A DNN model designed for AAD resulted in an average classification accuracy of 82.32% and 58.03% for ∼30 second and 8 seconds long time windows, respectively, when trained only on the real EEG data. The results show that GANs generated relatively realistic speech-evoked EEG signals. A DNN trained with GAN-generated data resulted in an average accuracy 90.25% for 8 seconds long time windows. On shorter trials the GAN-generated EEG data have shown to significantly improve classification performances, when compared to models only trained on real EEG data. Conclusion: The results suggest that DNN models can outperform linear models in AAD tasks, and that GAN-based EEG data augmentation can be used to further improve DNN performance. These results extend prior work and brings us closer to the use of EEG for decoding auditory attention in next-generation neuro-steered hearing aids.
|
3 |
Sound source segregation of multiple concurrent talkers via Short-Time Target CancellationCantu, Marcos Antonio 22 October 2018 (has links)
The Short-Time Target Cancellation (STTC) algorithm, developed as part of this dissertation research, is a “Cocktail Party Problem” processor that can boost speech intelligibility for a target talker from a specified “look” direction, while suppressing the intelligibility of competing talkers. The algorithm holds promise for both automatic speech recognition and assistive listening device applications. The STTC algorithm operates on a frame-by-frame basis, leverages the computational efficiency of the Fast Fourier Transform (FFT), and is designed to run in real time. Notably, performance in objective measures of speech intelligibility and sound source segregation is comparable to that of the Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM). Because the STTC algorithm computes a time-frequency mask that can be applied independently to both the left and right signals, binaural cues for spatial hearing, including Interaural Time Differences (ITDs), Interaural Level Differences (ILDs) and spectral cues, can be preserved in potential hearing aid applications. A minimalist design for a proposed STTC Assistive Listening Device (ALD), consisting of six microphones embedded in the frame of a pair of eyeglasses, is presented and evaluated using virtual room acoustics and both objective and behavioral measures. The results suggest that the proposed STTC ALD can provide a significant speech intelligibility benefit in complex auditory scenes comprised of multiple spatially separated talkers. / 2020-10-22T00:00:00Z
|
4 |
A biologically inspired approach to the cocktail party problemChou, Kenny F. 19 May 2020 (has links)
At a cocktail party, one can choose to scan the room for conversations of interest, attend to a specific conversation partner, switch between conversation partners, or not attend to anything at all. The ability of the normal-functioning auditory system to flexibly listen in complex acoustic scenes plays a central role in solving the cocktail party problem (CPP). In contrast, certain demographics (e.g., individuals with hearing impairment or older adults) are unable to solve the CPP, leading to psychological ailments and reduced quality of life. Since the normal auditory system still outperforms machines in solving the CPP, an effective solution may be found by mimicking the normal-functioning auditory system.
Spatial hearing likely plays an important role in CPP-processing in the auditory system. This thesis details the development of a biologically based approach to the CPP by modeling specific neural mechanisms underlying spatial tuning in the auditory cortex. First, we modeled bottom-up, stimulus-driven mechanisms using a multi-layer network model of the auditory system. To convert spike trains from the model output into audible waveforms, we designed a novel reconstruction method based on the estimation of time-frequency masks. We showed that our reconstruction method produced sounds with significantly higher intelligibility and quality than previous reconstruction methods. We also evaluated the algorithm's performance using a psychoacoustic study, and found that it provided the same amount of benefit to normal-hearing listeners as a current state-of-the-art acoustic beamforming algorithm.
Finally, we modeled top-down, attention driven mechanisms that allowed the network to flexibly operate in different regimes, e.g., monitor the acoustic scene, attend to a specific target, and switch between attended targets. The model explains previous experimental observations, and proposes candidate neural mechanisms underlying flexible listening in cocktail-party scenarios. The strategies proposed here would benefit hearing-assistive devices for CPP processing (e.g., hearing aids), where users would benefit from switching between various modes of listening in different social situations. / 2022-05-19T00:00:00Z
|
5 |
Neurophysiological Mechanisms of Speech Intelligibility under Masking and DistortionVibha Viswanathan (11189856) 29 July 2021 (has links)
<pre><p>Difficulty understanding speech in background noise is the most common hearing complaint. Elucidating the neurophysiological mechanisms underlying speech intelligibility in everyday environments with multiple sound sources and distortions is hence important for any technology that aims to improve real-world listening. Using a combination of behavioral, electroencephalography (EEG), and computational modeling experiments, this dissertation provides insight into how the brain analyzes such complex scenes, and what roles different acoustic cues play in facilitating this process and in conveying phonetic content. Experiment #1 showed that brain oscillations selectively track the temporal envelopes (i.e., modulations) of attended speech in a mixture of competing talkers, and that the strength and pattern of this attention effect differs between individuals. Experiment #2 showed that the fidelity of neural tracking of attended-speech envelopes is strongly shaped by the modulations in interfering sounds as well as the temporal fine structure (TFS) conveyed by the cochlea, and predicts speech intelligibility in diverse listening environments. Results from Experiments #1 and #2 support the theory that temporal coherence of sound elements across envelopes and/or TFS shapes scene analysis and speech intelligibility. Experiment #3 tested this theory further by measuring and computationally modeling consonant categorization behavior in a range of background noises and distortions. We found that a physiologically plausible model that incorporated temporal-coherence effects predicted consonant confusions better than conventional speech-intelligibility models, providing independent evidence that temporal coherence influences scene analysis. Finally, results from Experiment #3 also showed that TFS is used to extract speech content (voicing) for consonant categorization even when intact envelope cues are available. Together, the novel insights provided by our results can guide future models of speech intelligibility and scene analysis, clinical diagnostics, improved assistive listening devices, and other audio technologies.</p></pre>
|
6 |
Evaluation of Methods for Sound Source Separation in Audio Recordings Using Machine LearningGidlöf, Amanda January 2023 (has links)
Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC. The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices.
|
Page generated in 0.0566 seconds