Spelling suggestions: "subject:"diarization"" "subject:"judiciarisation""
1 |
Quelques contributions pour la séparation et la diarisation de sources audio dans des mélanges multicanaux convolutifs / New contributions to audio source separation and diarisation of Multichannel Convolutive MixturesKounadis-Bastian, Dionyssos 24 February 2017 (has links)
Dans cette thèse, nous abordons le problème de la séparation de sources audio dans des mélanges convolutifs multicanaux et sous-déterminés,en utilisant une modélisation probabiliste.Nous nous concentrons sur trois aspects,et nous apportons trois contributions.D’abord, nous nous inspirons du modèle Gaussien local par factorisation en matrices non-négatives (LGM-with-NMF), qui est un modéle empiriquement validé pour représenter un signal audio.Nous proposons une extension Bayésienne de ce modèle, qui permet de surpasser certaines limitations du modèle NMF. Nous incorporons cette représentation dans un cadre de séparation audio multicanaux, et le comparons avec l’état de l’art sur des tâches de séparation. Nous obtenons des résultats prometteurs. Deuxièmement, nous étudions comment séparer des mélanges audio de sources et/ou des capteurs en mouvement. Ces déplacements rendent le chemin acoustique entre les sources et les microphones variant en cours du temps.L’adressage des mélanges convolutifs et variant au cours du temps semble rare dans la littérature. Ainsi, nous partons d’une méthode état de l’art utilisant LGM-with-NMF, développée pour la séparation de mélanges invariants (sources et microphones statiques). Nous proposons a ceci une extension qui utilise un filtre de Kalman pour suivre le chemin acoustique au cours du temps.La méthode proposée est comparée à une adaptation bloc par bloc d’une méthode de l’état de l’art appliquée sur des intervalles de temps,et adonné des résultats exceptionnels sur les mélanges simulés et les mélanges du monde réel. Enfin, nous investiguons les similitudes entre la séparation et la diarisation audio. La diarisation audio est le problème de l’annotation des intervalles d’un mélange audio, auxquels chaque locuteur/source est émettant. La plupart des méthodes de séparation supposent toutes les sources à émettant continuellement. Une hypothèse qui peut donner lieu à de fausses estimations,durant les intervalles au cours desquels cette source n’émettait pas. Notre objectif est que diarisation puisse aider à résoudre la séparation, en indiquant les sources qui émettent chaque intervalle de temps.Dans cette mesure, nous concevons une cadre commun pour traiter simultanément la diarisation et la séparation du mélange audio. Ce cadre incorpore,un modèle de Markov caché pour suivre les activités des sources,au sein d’une méthode de séparation LGM-with-NMF.Nous comparons l’algorithme proposé, à l’état de l’art sur des tâches de séparation et de diarisation. Nous obtenons des performances comparables avec l’état de l’art pour la séparation, et supérieures pour la diarisation. / In this thesis we address the problem of audio source separation (ASS) for multichannel and underdetermined convolutive mixtures through probabilistic modeling. We focus on three aspects of the problem and make three contributions. Firstly, inspired from the empirically well validated representation of an audio signal, that is know as local Gaussian signal model (LGM) with non-negative matrix factorization (NMF), we propose a Bayesian extension to this, that overcomes some of the limitations of the NMF. We incorporate this representation in a multichannel ASS framework and compare it with the state of the art in ASS, yielding promising results.Secondly, we study how to separate mixtures of moving sources and/or of moving microphones.Movements make the acoustic path between sources and microphones become time-varying.Addresing time-varying audio mixtures appears is not so popular in the ASS literature.Thus, we begin from a state of the art LGM-with-NMF method designed for separating time-invariant audiomixtures and propose an extension that uses a Kalman smoother to track the acoustic path across time.The proposed method is benchmarked against a block-wise adaptation of that state of the art (ran on time segments),and delivers competitive results on both simulated and real-world mixtures.Lastly, we investigate the link between ASS and the task of audio diarisation.Audio diarisation is the recognition of the time intervals of activity of every speaker/source in the mix.Most state of the art ASS methods consider the sources ceaselssly emitting; A hypothesis that can result in spurious signal estimates for a source, in intervals where that source was not emitting.Our aim is that diarisation can aid ASS by indicating the emitting sources at each time frame.To that extent we design a joint framework for simultaneous diarization and ASS,that incorporates a hidden Markov model (HMM) to track the temporal activity of the sources, within a state of the art LGM-with-NMF ASS framework.We compare the proposed method with the state of the art in ASS and audio diarisation tasks.We obtain performances comparable, with the state of the art, in terms of separation and outperformant in terms of diarisation.
|
2 |
Speech processing using digital MEMS microphonesZwyssig, Erich Paul January 2013 (has links)
The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance.
|
Page generated in 0.0872 seconds