Global ETD Search

41	Audiovisual voice activity detection and localization of simultaneous speech sources / Detecção de atividade de voz e localização de fontes sonoras simultâneas utilizando informações audiovisuais Minotto, Vicente Peruffo January 2013 (has links) Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores. / Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers. Reconhecimento : Padroes Reconhecimento : Voz Voz computacional Tempo real Voice activity detection Sound source localization Multiple speakers Competing sources Multimodal fusion Microphone array HiddenMarkov model Support vector machine GPU programming
42	Audiovisual voice activity detection and localization of simultaneous speech sources / Detecção de atividade de voz e localização de fontes sonoras simultâneas utilizando informações audiovisuais Minotto, Vicente Peruffo January 2013 (has links) Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores. / Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers. Reconhecimento : Padroes Reconhecimento : Voz Voz computacional Tempo real Voice activity detection Sound source localization Multiple speakers Competing sources Multimodal fusion Microphone array HiddenMarkov model Support vector machine GPU programming
43	Sound source localization with data and model uncertainties using the EM and Evidential EM algorithms / Estimation de sources acoustiques avec prise en compte de l'incertitude de propagation Wang, Xun 09 December 2014 (has links) Ce travail de thèse se penche sur le problème de la localisation de sources acoustiques à partir de signaux déterministes et aléatoires mesurés par un réseau de microphones. Le problème est résolu dans un cadre statistique, par estimation via la méthode du maximum de vraisemblance. La pression mesurée par un microphone est interprétée comme étant un mélange de signaux latents émis par les sources. Les positions et les amplitudes des sources acoustiques sont estimées en utilisant l’algorithme espérance-maximisation (EM). Dans cette thèse, deux types d’incertitude sont également pris en compte : les positions des microphones et le nombre d’onde sont supposés mal connus. Ces incertitudes sont transposées aux données dans le cadre théorique des fonctions de croyance. Ensuite, les positions et les amplitudes des sources acoustiques peuvent être estimées en utilisant l’algorithme E2M, qui est une variante de l’algorithme EM pour les données incertaines.La première partie des travaux considère le modèle de signal déterministe sans prise en compte de l’incertitude. L’algorithme EM est utilisé pour estimer les positions et les amplitudes des sources. En outre, les résultats expérimentaux sont présentés et comparés avec le beamforming et la holographie optimisée statistiquement en champ proche (SONAH), ce qui démontre l’avantage de l’algorithme EM. La deuxième partie considère le problème de l’incertitude du modèle et montre comment les incertitudes sur les positions des microphones et le nombre d’onde peuvent être quantifiées sur les données. Dans ce cas, la fonction de vraisemblance est étendue aux données incertaines. Ensuite, l’algorithme E2M est utilisé pour estimer les sources acoustiques. Finalement, les expériences réalisées sur les données réelles et simulées montrent que les algorithmes EM et E2M donnent des résultats similaires lorsque les données sont certaines, mais que ce dernier est plus robuste en présence d’incertitudes sur les paramètres du modèle. La troisième partie des travaux présente le cas de signaux aléatoires, dont l’amplitude est considérée comme une variable aléatoire gaussienne. Dans le modèle sans incertitude, l’algorithme EM est utilisé pour estimer les sources acoustiques. Dans le modèle incertain, les incertitudes sur les positions des microphones et le nombre d’onde sont transposées aux données comme dans la deuxième partie. Enfin, les positions et les variances des amplitudes aléatoires des sources acoustiques sont estimées en utilisant l’algorithme E2M. Les résultats montrent ici encore l’avantage d’utiliser un modèle statistique pour estimer les sources en présence, et l’intérêt de prendre en compte l’incertitude sur les paramètres du modèle. / This work addresses the problem of multiple sound source localization for both deterministic and random signals measured by an array of microphones. The problem is solved in a statistical framework via maximum likelihood. The pressure measured by a microphone is interpreted as a mixture of latent signals emitted by the sources; then, both the sound source locations and strengths can be estimated using an expectation-maximization (EM) algorithm. In this thesis, two kinds of uncertainties are also considered: on the microphone locations and on the wave number. These uncertainties are transposed to the data in the belief functions framework. Then, the source locations and strengths can be estimated using a variant of the EM algorithm, known as Evidential EM (E2M) algorithm. The first part of this work begins with the deterministic signal model without consideration of uncertainty. The EM algorithm is then used to estimate the source locations and strengths : the update equations for the model parameters are provided. Furthermore, experimental results are presented and compared with the beamforming and the statistically optimized near-field holography (SONAH), which demonstrates the advantage of the EM algorithm. The second part raises the issue of model uncertainty and shows how the uncertainties on microphone locations and wave number can be taken into account at the data level. In this case, the notion of the likelihood is extended to the uncertain data. Then, the E2M algorithm is used to solve the sound source estimation problem. In both the simulation and real experiment, the E2M algorithm proves to be more robust in the presence of model and data uncertainty. The third part of this work considers the case of random signals, in which the amplitude is modeled by a Gaussian random variable. Both the certain and uncertain cases are investigated. In the former case, the EM algorithm is employed to estimate the sound sources. In the latter case, microphone location and wave number uncertainties are quantified similarly to the second part of the thesis. Finally, the source locations and the variance of the random amplitudes are estimated using the E2M algorithm. Localisation de sources acoustiques Fonctions de croyances Algorithmes d'expectation-maximisation Algorithmes E2M Sound source localization Holography Beamforming Acoustic imaging Belief functions Evidential EM algorithm
44	Metody akustické holografie v blízkém poli v prostředí LabVIEW / Near-field acoustical holography methods in LabVIEW environment Majvald, František January 2021 (has links) Near-field acoustical holography is a standard method for sound source visualization and localization. In this thesis, commonly used and newly published near-field acoustic holography methods are introduced and analysed. In addition, regularization methods are presented together with finding options of optimal regularization parameter. Based on theory, a LabVIEW library is built containing four implemented near-field acoustical holography algorithms and two regularization methods. To verify the correctness of implementations, a testing application has been made. This application allows testing of implemented algorithms with simulated or experimentally measured data. The correctness of implementation is verified, and algorithms are compared to each other with respect to accuracy and speed of calculation.
45	Lokalizace pohyblivých akustických zdrojů / Localization of moving acoustical sources Bezdíček, Martin January 2010 (has links) This master's thesis is focused on localization static (entering semester project) and moving acoustic sources (entering master's thesis) by the help of microphonic arrays. In the first part deal with common problems of localization. Further are here described types of microphonic arrays, simplifying possibilities which delimited this problems and general information about room acoustics. In the next part of this master's thesis are step by step mentioned methods localization of acoustic sources. In practical part were used algorithms: Steered-Beamformer-Based Locators and TDOA-Based Locators. Last part of this master's work includes results of these algorithms.
46	Analýza vibrací pomocí akustické holografie / Using Acoustic Holography for Vibration Analysis Havránek, Zdeněk January 2009 (has links) Disertační práce se zabývá bezkontaktní analýzou vibrací pomocí metod akustické holografie v blízkém poli. Akustická holografie v blízkém poli je experimentální metoda, která rekonstruuje akustické pole v těsné blízkosti povrchu vibrujícího předmětu na základě měření akustického tlaku nebo akustické rychlosti v určité vzdálenosti od zkoumaného předmětu. Konkrétní realizace této metody závisí na použitém výpočetním algoritmu. Vlastní práce je zaměřena zejména na rozbor algoritmů, které využívají k rekonstrukci zvukového pole v blízkosti vibrujícího objektu transformaci do domény vlnových čísel (prostorová transformace), kde probíhá vlastní výpočet. V úvodu práce je vysvětlena základní teorie metody akustické holografie v blízkém poli s popisem základních vlastností a dále rozborem konkrétních nejčastěji používaných algoritmům pro lokalizaci a charakterizaci zdroje zvuku a pro následnou vibrační analýzu. Stěžejní část práce se věnuje pokročilým metodám zpracování, které se snaží určitým způsobem optimalizovat přesnost predice zvukového pole v blízkosti vibrujícího předmětu v reálných podmínkách. Jde zejména o problematiku použitého měřicího systému s akustickými snímači, které nejsou ideální, a dále o možnost měření v prostorách s difúzním charakterem zvukového pole. Pro tento případ byla na základě literárního průzkumu optimalizována a ověřena metoda využívající dvouvrstvé mikrofonní pole, které umožňuje oddělení zvukových polí přicházejících z různých stran a tedy úspěšné měření v uzavřených prostorách např. kabin automobilů a letadel. Součástí práce byla také optimalizace, rozšíření a následné ověření algoritmů publikovaných v posledních letech pro měření v reálných podmínkách za použití běžně dostupných akustických snímačů.
47	Evaluation of Methods for Sound Source Separation in Audio Recordings Using Machine Learning Gidlöf, Amanda January 2023 (has links) Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC. The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices. Sound source separation signal processing audio processing speech enhancement speaker identification speech separation speech processing speaker separation single channel multi channel cocktail party problem machine learning neural network deep learning deep neural network convolutional network dual-path network filter-and-sum network recurrent network beamforming Communication Systems Kommunikationssystem

Page generated in 0.0612 seconds