Global ETD Search

1	An instantaneous frequency approach to the pitch tracking of overlapping voices Dorrell, Paul Roderick January 1996 (has links) No description available. 621.3994 Speech separation; Hearing aids
2	Robust variational Bayesian clustering for underdetermined speech separation Zohny, Zeinab Y. January 2016 (has links) The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved. 621.382 Bayesian analysis ; Speech separation
3	Speech Segregation in Background Noise and Competing Speech Hu, Ke 17 July 2012 (has links) No description available. Computer Science Monaural Speech Separation CASA Unvoiced Speech Nonspeech Interference Cochannel Speech Separation Unsupervised Clustering Model-based Method Iterative Estimation
4	Quelques contributions au filtrage optimal avec l'estimation de paramètres et application à la séparation de la parole mono-capteur / Some contributions to joint optimal filtering and parameter estimation with application to monaural speech separation Bensaid, Siouar 06 June 2014 (has links) Nous traitons le sujet de l’estimation conjointe des signaux aléatoires dépendant de paramètres déterministes et inconnus. Premièrement, on aborde le sujet du côté applicatif en proposant deux algorithmes de séparation de la parole voisée mono-capteur. Dans le premier, nous utilisons le modèle autorégressif de la parole qui décrit les corrélations court et long termes (quasi-périodique) pour formuler un modèle d’état dépendant de paramètres inconnus. EM-Kalman est ainsi utilisé pour estimer conjointement les sources et les paramètres. Dans le deuxième, nous proposons une méthode fréquentielle pour le même modèle de la parole où les sources et les paramètres sont estimés séparément. Les observations sont découpées à l’aide d’un fenêtrage bien conçu pour assurer une reconstruction parfaite des sources après. Les paramètres (de l’enveloppe spectrale) sont estimés en maximisant le critère du GML exprimé avec la matrice de covariance paramétrée que nous modélisons plus correctement en tenant compte de l’effet du fenêtrage. Le filtre de Wiener est utilisé pour estimer les sources. Deuxièmement, on aborde l’estimation conjointe d’un point de vue plus théorique en s'interrogeant sur les performances relatives de l’estimation conjointe par rapport à l’estimation séparée d’une manière générale. Nous considérons le cas conjointement Gaussien (observations et variables cachées) et trois méthodes itératives d'estimation conjointe: MAP en alternance avec ML, biaisé même asymptotiquement pour les paramètres, EM qui converge asymptotiquement vers ML et VB que nous prouvons converger asymptotiquement vers la solution ML pour les paramètres déterministes. / The thesis is composed of two parts. In the first part, we deal with the monaural speech separation problem. We propose two algorithms. In the first algorithm, we exploit the joint autoregressive model that models short and long (periodic) correlations of Gaussian speech signals to formulate a state space model with unknown parameters. The EM-Kalman algorithm is then used to estimate jointly the sources (involved in the state vector) and the parameters of the model. In the second algorithm, we use the same speech model but this time in the frequency domain (quasi-periodic Gaussian sources with AR spectral envelope). Observation data is sliced using a well-designed window. Parameters are estimated separately from the sources by optimizing the Gaussian ML criterion expressed using the sample and parameterized covariance matrices. Classical frequency domain asymptotic methods replace linear convolution by circulant convolution leading to approximation errors. We show how the introduction of windows can lead to slightly more complex frequency domain techniques, replacing diagonal covariance matrices by banded covariance matrices, but with controlled approximation error. The sources are then estimated using the Wiener filtering. The second part is about the relative performance of joint vs. marginalized parameter estimation. We consider jointly Gaussian latent data and observations. We provide contributions to Cramer-Rao bounds, then, we investigate three iterative joint estimation approaches: Alternating MAP/ML which suffers from inconsistent parameter bias, EM which converges to ML and VB that we prove converges asymptotically to the ML solution for parameter estimation. EM-Kalman CRB Séparation de la parole monocapteur EM-Kalman Variational Bayes Monaural speech separation CRB
5	Evaluating the Performance of Using Speaker Diarization for Speech Separation of In-Person Role-Play Dialogues Medaramitta, Raveendra January 2021 (has links) No description available. Computer Engineering Computer Science role-play based experiential learning speaker diarization speech separation
6	Time-domain Deep Neural Networks for Speech Separation Sun, Tao 24 May 2022 (has links) No description available. Computer Science Speech Separation Deep Neural Networks Self-supervised Learning Speech Enhancement Speaker Separation
7	Deep Learning Based Array Processing for Speech Separation, Localization, and Recognition Wang, Zhong-Qiu 15 September 2020 (has links) No description available. Computer Science Computer Engineering microphone array processing speech enhancement speech separation robust automatic speech recognition deep learning
8	Monaural Speech Segregation in Reverberant Environments Jin, Zhaozhang 27 September 2010 (has links) No description available. Computer Science computational auditory scene analysis monaural segregation multipitch tracking pitch determination algorithm room reverberation speech separation supervised learning
9	Graphical Models for Robust Speech Recognition in Adverse Environments Rennie, Steven J. 01 August 2008 (has links) Robust speech recognition in acoustic environments that contain multiple speech sources and/or complex non-stationary noise is a difficult problem, but one of great practical interest. The formalism of probabilistic graphical models constitutes a relatively new and very powerful tool for better understanding and extending existing models, learning, and inference algorithms; and a bedrock for the creative, quasi-systematic development of new ones. In this thesis a collection of new graphical models and inference algorithms for robust speech recognition are presented. The problem of speech separation using multiple microphones is first treated. A family of variational algorithms for tractably combining multiple acoustic models of speech with observed sensor likelihoods is presented. The algorithms recover high quality estimates of the speech sources even when there are more sources than microphones, and have improved upon the state-of-the-art in terms of SNR gain by over 10 dB. Next the problem of background compensation in non-stationary acoustic environments is treated. A new dynamic noise adaptation (DNA) algorithm for robust noise compensation is presented, and shown to outperform several existing state-of-the-art front-end denoising systems on the new DNA + Aurora II and Aurora II-M extensions of the Aurora II task. Finally, the problem of speech recognition in speech using a single microphone is treated. The Iroquois system for multi-talker speech separation and recognition is presented. The system won the 2006 Pascal International Speech Separation Challenge, and amazingly, achieved super-human recognition performance on a majority of test cases in the task. The result marks a significant first in automatic speech recognition, and a milestone in computing. Robust Speech Recognition Graphical Models Speech Separation Dynamic Noise Adaptation 0984
10	Graphical Models for Robust Speech Recognition in Adverse Environments Rennie, Steven J. 01 August 2008 (has links) Robust speech recognition in acoustic environments that contain multiple speech sources and/or complex non-stationary noise is a difficult problem, but one of great practical interest. The formalism of probabilistic graphical models constitutes a relatively new and very powerful tool for better understanding and extending existing models, learning, and inference algorithms; and a bedrock for the creative, quasi-systematic development of new ones. In this thesis a collection of new graphical models and inference algorithms for robust speech recognition are presented. The problem of speech separation using multiple microphones is first treated. A family of variational algorithms for tractably combining multiple acoustic models of speech with observed sensor likelihoods is presented. The algorithms recover high quality estimates of the speech sources even when there are more sources than microphones, and have improved upon the state-of-the-art in terms of SNR gain by over 10 dB. Next the problem of background compensation in non-stationary acoustic environments is treated. A new dynamic noise adaptation (DNA) algorithm for robust noise compensation is presented, and shown to outperform several existing state-of-the-art front-end denoising systems on the new DNA + Aurora II and Aurora II-M extensions of the Aurora II task. Finally, the problem of speech recognition in speech using a single microphone is treated. The Iroquois system for multi-talker speech separation and recognition is presented. The system won the 2006 Pascal International Speech Separation Challenge, and amazingly, achieved super-human recognition performance on a majority of test cases in the task. The result marks a significant first in automatic speech recognition, and a milestone in computing. Robust Speech Recognition Graphical Models Speech Separation Dynamic Noise Adaptation 0984

Search results