• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • 1
  • Tagged with
  • 7
  • 7
  • 7
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A computer model of auditory stream segregation

Beauvois, Michael W. January 1991 (has links)
A simple computer model is described that takes a novel approach to the problem of accounting for perceptual coherence among successive pure tones of changing frequency by using simple physiological principles that operate at a peripheral, rather than a central level. The model is able to reproduce a number of streaming phenomena found in the literature using the same parameter values. These are: (1) the build-up of streaming over time; (2) the temporal coherence and fission boundaries of human listeners; (3) the ambiguous region; and (4) the trill threshold. In addition, the principle of excitation integration used in the model can be used to account for auditory grouping on the basis of the Gestalt perceptual principles of closure, proximity, continuity, and good continuation, as well as the pulsation threshold. The examples of Gestalt auditory grouping accounted for by the excitation integration principle indicate that the predictive power of the model would be considerably enhanced by the addition of a cross-channel grouping mechanism that worked on the basis of common on sets and offsets, as more complex stimuli could then be processed by the model.
2

Informed algorithms for sound source separation in enclosed reverberant environments

Khan, Muhammad Salman January 2013 (has links)
While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses.
3

Bayesian Microphone Array Processing / ベイズ法によるマイクロフォンアレイ処理

Otsuka, Takuma 24 March 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18412号 / 情博第527号 / 新制||情||93(附属図書館) / 31270 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
4

Multichannel audio processing for speaker localization, separation and enhancement

Martí Guerola, Amparo 29 October 2013 (has links)
This thesis is related to the field of acoustic signal processing and its applications to emerging communication environments. Acoustic signal processing is a very wide research area covering the design of signal processing algorithms involving one or several acoustic signals to perform a given task, such as locating the sound source that originated the acquired signals, improving their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing the type of source and the content of the message. Among the above tasks, Sound Source localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in this thesis. In fact, the localization of sound sources in a room has received a lot of attention in the last decades. Most real-word microphone array applications require the localization of one or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation). Some of these applications are teleconferencing systems, video-gaming, autonomous robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound source localization under high noise and reverberation is a very challenging task. One of the most well-known algorithms for source localization in noisy and reverberant environments is the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the baseline framework for the contributions proposed in this thesis. Another challenge in the design of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable number of microphones and limited computational resources. Although the SRP-PHAT algorithm has been shown to be an effective localization algorithm for real-world environments, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this context, several modifications and optimizations have been proposed to improve its performance and applicability. An effective strategy that extends the conventional SRP-PHAT functional is presented in this thesis. This approach performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation with a small hardware cost (reduced number of microphones). This strategy allows to implement real-time applications based on location information, such as automatic camera steering or the detection of speech/non-speech fragments in advanced videoconferencing systems. As stated before, besides the contributions related to SSL, this thesis is also related to the field of ASR. This technology allows a computer or electronic device to identify the words spoken by a person so that the message can be stored or processed in a useful way. ASR is used on a day-to-day basis in a number of applications and services such as natural human-machine interfaces, dictation systems, electronic translators and automatic information desks. However, there are still some challenges to be solved. A major problem in ASR is to recognize people speaking in a room by using distant microphones. In distant-speech recognition, the microphone does not only receive the direct path signal, but also delayed replicas as a result of multi-path propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound Source Separation (SSS) methods can be successfully employed to improve ASR performance in multi-source scenarios. This is the motivation behind the training method for multiple talk situations proposed in this thesis. This training, which is based on a robust transformed model constructed from separated speech in diverse acoustic environments, makes use of a SSS method as a speech enhancement stage that suppresses the unwanted interferences. The combination of source separation and this specific training has been explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance. / Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
5

Estimation du niveau sonore de sources d'intérêt au sein de mixtures sonores urbaines : application au trafic routier / Estimation of the noise level of sources of interest within urban noise mixtures : application to road traffic

Gloaguen, Jean-Rémy 03 October 2018 (has links)
Des réseaux de capteurs acoustiques sont actuellement mis en place dans plusieurs grandes villes afin d’obtenir une description plus fine de l’environnement sonore urbain. Un des défis à relever est celui de réussir,à partir d’enregistrements sonores, à estimer des indicateurs utiles tels que le niveau sonore du trafic routier. Cette tâche n’est en rien triviale en raison de la multitude de sources sonores qui composent cet environnement. Pour cela, la Factorisation en Matrices Non-négatives (NMF) est considérée et appliquée sur deux corpus de mixtures sonores urbaines simulés. L’intérêt de simuler de tels mélanges est la possibilité de connaitre toutes les caractéristiques de chaque classe de son dont le niveau sonore exact du trafic routier. Le premier corpus consiste en 750 scènes de 30 secondes mélangeant une composante de trafic routier dont le niveau sonore est calibré et une classe de son plus générique. Les différents résultats ont notamment permis de proposer une nouvelle approche, appelée « NMF initialisée seuillée », qui se révèle être la plus performante. Le deuxième corpus créé permet de simuler des mixtures sonores plus représentatives des enregistrements effectués en villes, dont leur réalisme a été validé par un test perceptif. Avec une erreur moyenne d’estimation du niveau sonore inférieure à 1,2 dB, la NMF initialisée seuillée se révèle, là encore, la méthode la plus adaptée aux différents environnements sonores urbains. Ces résultats ouvrent alors la voie vers l’utilisation de cette méthode à d’autres sources sonores, celles que les voix et les sifflements d’oiseaux, qui pourront mener, à terme, à la réalisation de cartes de bruits multi-sources. / Acoustic sensor networks are being set up in several major cities in order to obtain a more detailed description of the urban sound environment. One challenge is to estimate useful indicators such as the road traffic noise level on the basis of sound recordings. This task is by no means trivial because of the multitude of sound sources that composed this environment. For this, Non-negative Matrix Factorization (NMF) is considered and applied on two corpuses of simulated urban sound mixtures. The interest of simulating such mixtures is the possibility of knowing all the characteristics of each sound class including the exact road traffic noise level. The first corpus consists of 750 30-second scenes mixing a road traffic component with a calibrated sound level and a more generic sound class. The various results have notably made it possible to propose a new approach, called ‘Thresholded Initialized NMF', which is proving to be the most effective. The second corpus created makes it possible to simulate sound mixtures more representatives of recordings made in cities whose realism has been validated by a perceptual test. With an average noise level estimation error of less than 1.3 dB, the Thresholded Initialized NMF stays the most suitable method for the different urban noise environments. These results open the way to the use of this method for other sound sources, such as birds' whistling and voices, which can eventually lead to the creation of multi-source noise maps.
6

Harmonic Sound Source Separation in Monaural Music Signals

Goel, Priyank January 2013 (has links) (PDF)
Sound Source Separation refers to separating sound signals according to their sources from a given observed sound. It is efficient to code and very easy to analyze and manipulate sounds from individual sources separately than in a mixture. This thesis deals with the problem of source separation in monaural recordings of harmonic musical instruments. A good amount of literature is surveyed and presented since sound source separation has been tried by many researchers over many decades through various approaches. A prediction driven approach is first presented which is inspired by old-plus-new heuristic used by humans for Auditory Scene Analysis. In this approach, the signals from different sources are predicted using a general model and then these predictions are reconciled with observed sound to get the separated signal. This approach failed for real world sound recordings in which the spectrum of the source signals change very dynamically. Considering the dynamic nature of the spectrums, an approach which uses covariance matrix of amplitudes of harmonics is proposed. The overlapping and non-overlapping harmonics of the notes are first identified with the knowledge of pitch of the notes. The notes are matched on the basis of their covariance profiles. The second order properties of overlapping harmonics of a note are estimated with the use of co-variance matrix of a matching note. The full harmonic is then reconstructed using these second order characteristics. The technique has performed well over sound samples taken from RWC musical Instrument database.
7

Evaluation of Methods for Sound Source Separation in Audio Recordings Using Machine Learning

Gidlöf, Amanda January 2023 (has links)
Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC.  The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices.

Page generated in 0.1514 seconds