Global ETD Search

11	Traitement de l'incertitude pour la reconnaissance de la parole robuste au bruit / Uncertainty learning for noise robust ASR Tran, Dung Tien 20 November 2015 (has links) Cette thèse se focalise sur la reconnaissance automatique de la parole (RAP) robuste au bruit. Elle comporte deux parties. Premièrement, nous nous focalisons sur une meilleure prise en compte des incertitudes pour améliorer la performance de RAP en environnement bruité. Deuxièmement, nous présentons une méthode pour accélérer l'apprentissage d'un réseau de neurones en utilisant une fonction auxiliaire. Dans la première partie, une technique de rehaussement multicanal est appliquée à la parole bruitée en entrée. La distribution a posteriori de la parole propre sous-jacente est alors estimée et représentée par sa moyenne et sa matrice de covariance, ou incertitude. Nous montrons comment propager la matrice de covariance diagonale de l'incertitude dans le domaine spectral à travers le calcul des descripteurs pour obtenir la matrice de covariance pleine de l'incertitude sur les descripteurs. Le décodage incertain exploite cette distribution a posteriori pour modifier dynamiquement les paramètres du modèle acoustique au décodage. La règle de décodage consiste simplement à ajouter la matrice de covariance de l'incertitude à la variance de chaque gaussienne. Nous proposons ensuite deux estimateurs d'incertitude basés respectivement sur la fusion et sur l'estimation non-paramétrique. Pour construire un nouvel estimateur, nous considérons la combinaison linéaire d'estimateurs existants ou de fonctions noyaux. Les poids de combinaison sont estimés de façon générative en minimisant une mesure de divergence par rapport à l'incertitude oracle. Les mesures de divergence utilisées sont des versions pondérées des divergences de Kullback-Leibler (KL), d'Itakura-Saito (IS) ou euclidienne (EU). En raison de la positivité inhérente de l'incertitude, ce problème d'estimation peut être vu comme une instance de factorisation matricielle positive (NMF) pondérée. De plus, nous proposons deux estimateurs d'incertitude discriminants basés sur une transformation linéaire ou non linéaire de l'incertitude estimée de façon générative. Cette transformation est entraînée de sorte à maximiser le critère de maximum d'information mutuelle boosté (bMMI). Nous calculons la dérivée de ce critère en utilisant la règle de dérivation en chaîne et nous l'optimisons par descente de gradient stochastique. Dans la seconde partie, nous introduisons une nouvelle méthode d'apprentissage pour les réseaux de neurones basée sur une fonction auxiliaire sans aucun réglage de paramètre. Au lieu de maximiser la fonction objectif, cette technique consiste à maximiser une fonction auxiliaire qui est introduite de façon récursive couche par couche et dont le minimum a une expression analytique. Grâce aux propriétés de cette fonction, la décroissance monotone de la fonction objectif est garantie / This thesis focuses on noise robust automatic speech recognition (ASR). It includes two parts. First, we focus on better handling of uncertainty to improve the performance of ASR in a noisy environment. Second, we present a method to accelerate the training process of a neural network using an auxiliary function technique. In the first part, multichannel speech enhancement is applied to input noisy speech. The posterior distribution of the underlying clean speech is then estimated, as represented by its mean and its covariance matrix or uncertainty. We show how to propagate the diagonal uncertainty covariance matrix in the spectral domain through the feature computation stage to obtain the full uncertainty covariance matrix in the feature domain. Uncertainty decoding exploits this posterior distribution to dynamically modify the acoustic model parameters in the decoding rule. The uncertainty decoding rule simply consists of adding the uncertainty covariance matrix of the enhanced features to the variance of each Gaussian component. We then propose two uncertainty estimators based on fusion to nonparametric estimation, respectively. To build a new estimator, we consider a linear combination of existing uncertainty estimators or kernel functions. The combination weights are generatively estimated by minimizing some divergence with respect to the oracle uncertainty. The divergence measures used are weighted versions of Kullback-Leibler (KL), Itakura-Saito (IS), and Euclidean (EU) divergences. Due to the inherent nonnegativity of uncertainty, this estimation problem can be seen as an instance of weighted nonnegative matrix factorization (NMF). In addition, we propose two discriminative uncertainty estimators based on linear or nonlinear mapping of the generatively estimated uncertainty. This mapping is trained so as to maximize the boosted maximum mutual information (bMMI) criterion. We compute the derivative of this criterion using the chain rule and optimize it using stochastic gradient descent. In the second part, we introduce a new learning rule for neural networks that is based on an auxiliary function technique without parameter tuning. Instead of minimizing the objective function, this technique consists of minimizing a quadratic auxiliary function which is recursively introduced layer by layer and which has a closed-form optimum. Based on the properties of this auxiliary function, the monotonic decrease of the new learning rule is guaranteed. Reconnaissance automatique de la parole Robustesse au bruit Rehaussement de la parole Propagation de l’incertitude Automatic speech recognition Noise robustness Speech enhancement Uncertainty propagation 006.454 621.399
12	An ensemble speaker and speaking environment modeling approach to robust speech recognition Tsao, Yu 18 November 2008 (has links) In this study, an ensemble speaker and speaking environment modeling (ESSEM) approach is proposed to characterize environments in order to enhance performance robustness of automatic speech recognition (ASR) systems under adverse conditions. The ESSEM process comprises two stages, the offline and online phases. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov Models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. A series of techniques is proposed to further improve the original ESSEM approach on both offline and online phases. For the offline phase, we focus on methods to enhance the construction and coverage of the environment space. We first demonstrate environment clustering and environment partitioning algorithms to well structure the environment space; then, we propose a discriminative training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. For the online phase, we study methods to increase the efficiency and precision in estimating the target super-vector for the testing condition. To enhance the efficiency, we incorporate dimensionality reduction techniques to reduce the complexity of the original environment space. To improve the precision, we first study different forms of mapping function and propose a weighted N-best information technique; then, we propose cohort selection, environment space adaptation and multiple cluster matching algorithms to facilitate the environment characterization. We evaluate the proposed ESSEM framework on the Aurora-2 connected digit recognition task. Experimental results verify that the original ESSEM approach already provides clear improvement over a baseline system without environment compensation. Moreover, the performance of ESSEM can be further enhanced by using the proposed offline and online algorithms. A significant improvement of 16.08% word error rate reduction is achieved by ESSEM with optimal offline and online configuration over our best baseline system on the Aurora-2 task. ESSEM Stochastic matching Noise robustness Environment modeling Automatic speech recognition Speech processing systems Hidden Markov models
13	[pt] ENGENHARIA DE RECURSOS PARA LIDAR COM DADOS RUIDOSOS NA IDENTIFICAÇÃO ESPARSA SOB AS PERSPECTIVAS DE CLASSIFICAÇÃO E REGRESSÃO / [en] FEATURE ENGINEERING TO DEAL WITH NOISY DATA IN SPARSE IDENTIFICATION THROUGH CLASSIFICATION AND REGRESSION PERSPECTIVES THAYNA DA SILVA FRANCA 15 July 2021 (has links) [pt] Os sistemas dinâmicos desempenham um papel crucial no que diz respeito à compreensão de fenômenos inerentes a diversos campos da ciência. Desde a última década, todo aporte tecnológico alcançado ao longo de anos de investigação deram origem a uma estratégia orientada a dados, permitindo a inferência de modelos capazes de representar sistemas dinâmicos. Além disso, independentemente dos tipos de sensores adotados a fim de realizar o procedimento de aquisição de dados, é natural verificar a existência de uma certa corrupção ruidosa nos referidos dados. Genericamente, a tarefa de identificação é diretamente afetada pelo cenário ruidoso previamente descrito, implicando na falsa descoberta de um modelo generalizável. Em outras palavras, a corrupção ao ruído pode ser responsável pela geração de uma representação matemática infiel de um determinado sistema. Nesta tese, no que diz respeito à tarefa de identificação, é demonstrado como a robustez ao ruído pode ser melhorada a partir da hibridização de técnicas de aprendizado de máquina, como aumento de dados, regressão esparsa, seleção de características, extração de características, critério de informação, pesquisa em grade e validação cruzada. Especificamente, sob as perspectivas de classificação e regressão, o sucesso da estratégia proposta é apresentado a partir de exemplos numéricos, como o crescimento logístico, oscilador Duffing, modelo FitzHugh-Nagumo, atrator de Lorenz e uma modelagem Suscetível-Infeccioso-Recuperado (SIR) do Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). / [en] The dynamical systems play a fundamental role related to the understanding of the phenomena inherent to several fields of science. Since the last decade, all technological advances achieved throughout years of research have given rise to a data oriented strategy, enabling the inference of dynamical systems. Moreover, regardless the sensor types adopted to perform the data acquisition procedure, it is natural to verify the existence of a certain noise corruption in such data. Generically, the identification task is directly affected by the noisy scenario previously described, which entails in the false discovery of a generalizable model. In other words, the noise corruption might be responsible to give rise to a worthless mathematical representation of a given system. In this thesis, with respect to the identification assignment, it is demonstrated how the robustness to noise may be improved from the hybridization of machine learning techniques, such as data augmentation, sparse regression, feature selection, feature extraction, information criteria, grid search and cross validation. Specifically, through classification and regression perspectives, the success of the proposed strategy is presented from numerical examples, such as the logistic growth, Duffing oscillator, FitzHugh–Nagumo model, Lorenz attractor and a Susceptible-Infectious-Recovered (SIR) modeling of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). [pt] ROBUSTEZ AO RUIDO [pt] PESQUISA EM GRADE [pt] ENGENHARIA DE RECURSOS [pt] CORRUPCAO RUIDOSA [pt] IDENTIFICACAO ESPARSA [en] NOISE ROBUSTNESS [en] GRID SEARCH [en] FEATURE ENGINEERING [en] NOISY CORRUPTION [en] SPARSE IDENTIFICATION
14	Amélioration des ouvertures par chemins pour l'analyse d'images à N dimensions et implémentations optimisées / Image processing and analysis for non destructive real time quality control in 3D RX tomography Cokelaer, François 22 February 2013 (has links) La détection de structures fines et orientées dans une image peut mener à un très large champ d'applications en particulier dans le domaine de l'imagerie médicale, des sciences des matériaux ou de la télédétection. Les ouvertures et fermetures par chemins sont des opérateurs morphologiques utilisant des chemins orientés et flexibles en guise d'éléments structurants. Ils sont utilisés de la même manière que les opérateurs morphologiques utilisant des segments orientés comme éléments structurants mais sont plus efficaces lorsqu'il s'agit de détecter des structures pouvant être localement non rigides. Récemment, une nouvelle implémentation des opérateurs par chemins a été proposée leur permettant d'être appliqués à des images 2D et 3D de manière très efficace. Cependant, cette implémentation est limitée par le fait qu'elle n'est pas robuste au bruit affectant les structures fines. En effet, pour être efficaces, les opérateurs par chemins doivent être suffisamment longs pour pouvoir correspondre à la longueur des structures à détecter et deviennent de ce fait beaucoup plus sensibles au bruit de l'image. La première partie de ces travaux est dédiée à répondre à ce problème en proposant un algorithme robuste permettant de traiter des images 2D et 3D. Nous avons proposé les opérateurs par chemins robustes, utilisant une famille plus grande d'éléments structurants et qui, donnant une longueur L et un paramètre de robustesse G, vont permettre la propagation du chemin à travers des déconnexions plus petites ou égales à G, rendant le paramètre G indépendant de L. Cette simple proposition mènera à une implémentation plus efficace en terme de complexité de calculs et d'utilisation mémoire que l'état de l'art. Les opérateurs développés ont été comparés avec succès avec d'autres méthodes classiques de la détection des structures curvilinéaires de manière qualitative et quantitative. Ces nouveaux opérateurs ont été par la suite intégrés dans une chaîne complète de traitement d'images et de modélisation pour la caractérisation des matériaux composite renforcés avec des fibres de verres. Notre étude nous a ensuite amenés à nous intéresser à des filtres morphologiques récents basés sur la mesure de caractéristiques géodésiques. Ces filtres sont une bonne alternative aux ouvertures par chemins car ils sont très efficaces lorsqu'il s'agit de détecter des structures présentant de fortes tortuosités ce qui est précisément la limitation majeure des ouvertures par chemins. La combinaison de la robustesse locale des ouvertures par chemins robustes et la capacité des filtres par attributs géodésiques à recouvrer les structures tortueuses nous ont permis de proposer un nouvel algorithme, les ouvertures par chemins robustes et sélectives. / The detection of thin and oriented features in an image leads to a large field of applications specifically in medical imaging, material science or remote sensing. Path openings and closings are efficient morphological operators that use flexible oriented paths as structuring elements. They are employed in a similar way to operators with rotated line segments as structuring elements, but are more effective as they can detect linear structures that are not necessarily locally perfectly straight. While their theory has always allowed paths in arbitrary dimensions, de facto implementations were only proposed in 2D. Recently, a new implementation was proposed enabling the computation of efficient d-dimensional path operators. However this implementation is limited in the sense that it is not robust to noise. Indeed, in practical applications, for path operators to be effective, structuring elements must be sufficiently long so that they correspond to the length of the desired features to be detected. Yet, path operators are increasingly sensitive to noise as their length parameter L increases. The first part of this work is dedicated to cope with this limitation. Thus, we will propose an efficient d-dimensional algorithm, the robust path operators, which use a larger family of flexible structuring elements. Given an arbitrary length parameter G, path propagation is allowed if disconnections between two pixels belonging to a path is less or equal to G and so, render it independent of L. This simple assumption leads to a constant memory bookkeeping and results in a low complexity. The developed operators have been compared qualitatively and quantitatively to other efficient methods for the detection of line-like features. As an application, robust path openings have been integrated into a complete chain of image processing for the modelling and the characterization of glass fibers reinforced polymer. Our study has also led us to focus our interest on recent morphological connected filters based on geodesic measurements. These filters are a good alternative to path operators as they are efficient at detecting the so-called "tortuous" shapes in an image which is precisely the main limitation of path operators. Combining the local robustness of the robust path operators with the ability of geodesic attribute-based filters to recover "tortuous" shapes have enabled us to propose another original algorithm, the selective and robust path operators. Traitement d'images à n dimensions Opérateur morphologique algébrique Éléments structurants flexibles D-dimensional processing Morphological algebraic operators Flexible structuring elements Noise robustness Line-like features detection 620
15	Computational auditory scene analysis and robust automatic speech recognition Narayanan, Arun 14 November 2014 (has links) No description available. Engineering Computer Science Automatic speech recognition noise robustness computational auditory scene analysis binary masking ratio masking mask estimation deep neural networks acoustic modeling speech separation speech enhancement noisy ASR CHiME-2 Aurora-4

Page generated in 0.0667 seconds