• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 14
  • 14
  • 9
  • 8
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Gaussian Mixture Model-based Feature Compensation with Application to Noise-robust Speech Recognition

Yeh, Bing-Feng 28 August 2012 (has links)
In this paper, we propose a new method for noise robustness base on Gaussian Mixture Model (GMM), and the method we proposed can estimate the noise feature effectively and reduce noise effect by plain fashion, and we can retain the smoothing and continuity from original feature in this way. Compared to the traditional feature transformation method MMSE(Minimum Mean Square Error) which want to find a clean one, the different is that the method we proposed only need to fine noise feature or the margin of noise effect and subtract the noise to achieve more robustness effect than traditional methods. In the experiment method, the test data pass through the trained noise classifier to judge the noise type and SNR, and according to the result of classifier to choose the corresponding transformation model and generate the noise feature by this model, and then we can use different weight linear combination to generate noise feature, and finally apply simple subtraction to achieve noise reduction. In the experiment, we use AURORA 2.0 corpus to estimate noise robustness performance, and using traditional method can achieve 36:8% relative improvement than default, and the our method can achieve 52:5% relative improvement, and compared to the traditional method our method can attain 24:9% relative improvement.
2

Statistical models for noise-robust speech recognition

van Dalen, Rogier Christiaan January 2011 (has links)
A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.
3

Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition

Wu, Kuo-hao 25 August 2010 (has links)
In this thesis, a novel technique based on the empirical mode decomposition (EMD) methodology is proposed and examined for the noise-robustness of automatic speech recognition systems. The EMD analysis is a generalization of the Fourier analysis for processing nonlinear and non-stationary time functions, in our case, the speech feature sequences. We use the intrinsic mode functions (IMF), which include the sinusoidal functions as special cases, obtained from the EMD analysis in the post-processing of the log energy feature. We evaluate the proposed method on Aurora 2.0 and Aurora 3.0 databases. On Aurora 2.0, we obtain a 44.9% overall relative improvement over the baseline for the mismatched (clean-training) tasks. The results show an overall improvement of 49.5% over the baseline for Aurora 3.0 on the high-mismatch tasks. It shows that our proposed method leads to significant improvement.
4

Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments / Parameteruttrekning for automatisk talegjenkjenning i støyende omgivelser

Gajic, Bojana January 2002 (has links)
<p>This thesis presents a study of alternative speech feature extraction methods aimed at increasing robustness of automatic speech recognition (ASR) against additive background noise. </p><p>Spectral peak positions of speech signals remain practically unchanged in presence of additive background noise. Thus, it was expected that emphasizing spectral peak positions in speech feature extraction would result in improved noise robustness of ASR systems. If frequency subbands are properly chosen, dominant subband frequencies can serve as reasonable estimates of spectral peak positions. Thus, different methods for incorporating dominant subband frequencies into speech feature vectors were investigated in this study.</p><p>To begin with, two earlier proposed feature extraction methods that utilize dominant subband frequency information were examined. The first one uses zero-crossing statistics of the subband signals to estimate dominant subband frequencies, while the second one uses subband spectral centroids. The methods were compared with the standard MFCC feature extraction method on two different recognition tasks in various background conditions. The first method was shown to improve ASR performance on both recognition tasks at sufficiently high noise levels. The improvement was, however, smaller on the more complex recognition task. The second method, on the other hand, led to some reduction in ASR performance in all testing conditions.</p><p>Next, a new method for incorporating subband spectral centroids into speech feature vectors was proposed, and was shown to be considerably more robust than the standard MFCC method on both ASR tasks. The main difference between the proposed method and the zero-crossing based method is in the way they utilize dominant subband frequency information. It was shown that the performance improvement due to the use of dominant subband frequency information was considerably larger for the proposed method than for the ZCPA method, especially on the more complex recognition task. Finally, the computational complexity of the proposed method is two orders of magnitude lower than that of the zero-crossing based method, and of the same order of magnitude as the standard MFCC method.</p>
5

Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments / Parameteruttrekning for automatisk talegjenkjenning i støyende omgivelser

Gajic, Bojana January 2002 (has links)
This thesis presents a study of alternative speech feature extraction methods aimed at increasing robustness of automatic speech recognition (ASR) against additive background noise. Spectral peak positions of speech signals remain practically unchanged in presence of additive background noise. Thus, it was expected that emphasizing spectral peak positions in speech feature extraction would result in improved noise robustness of ASR systems. If frequency subbands are properly chosen, dominant subband frequencies can serve as reasonable estimates of spectral peak positions. Thus, different methods for incorporating dominant subband frequencies into speech feature vectors were investigated in this study. To begin with, two earlier proposed feature extraction methods that utilize dominant subband frequency information were examined. The first one uses zero-crossing statistics of the subband signals to estimate dominant subband frequencies, while the second one uses subband spectral centroids. The methods were compared with the standard MFCC feature extraction method on two different recognition tasks in various background conditions. The first method was shown to improve ASR performance on both recognition tasks at sufficiently high noise levels. The improvement was, however, smaller on the more complex recognition task. The second method, on the other hand, led to some reduction in ASR performance in all testing conditions. Next, a new method for incorporating subband spectral centroids into speech feature vectors was proposed, and was shown to be considerably more robust than the standard MFCC method on both ASR tasks. The main difference between the proposed method and the zero-crossing based method is in the way they utilize dominant subband frequency information. It was shown that the performance improvement due to the use of dominant subband frequency information was considerably larger for the proposed method than for the ZCPA method, especially on the more complex recognition task. Finally, the computational complexity of the proposed method is two orders of magnitude lower than that of the zero-crossing based method, and of the same order of magnitude as the standard MFCC method.
6

A framework for low bit-rate speech coding in noisy environment

Krishnan, Venkatesh 21 April 2005 (has links)
State of the art model based coders offer a perceptually acceptable reconstructed speech quality at bit-rates as low as 2000 bits per second. However, the performance of these coders rapidly deteriorates below this rate, primarily since very few bits are available to encode the model parameters with high fidelity. This thesis aims to meet the challenge of designing speech coders that operate at lower bit-rates while reconstructing the speech at the receiver at the same or even better quality than state of the art low bit-rate speech coders. In one of the contributions, we develop a plethora of techniques for efficient coding of the parameters obtained by the MELP algorithm, under the assumption that the classification of the frames of the MELP coder is available. Also, a simple and elegant procedure called dynamic codebook reordering is presented for use in the encoders and decoders of a vector quantization system that effectively exploits the correlation between vectors of parameters obtained from consecutiv speech frames without introducing any delay, distortion or suboptimality. The potential of this technique in significantly reducing the bit-rates of speech coders is illustrated. Additionally, the thesis also attempts to address the issues of designing such very low bit-rate speech coders so that they are robust to environmental noise. To impart robustness, a speech enhancement framework employing Kalman filters is presented. Kalman filters designed for speech enhancement in the presence of noise assume an autoregressive model for the speech signal. We improve the performance of Kalman filters in speech enhancement by constraining the parameters of the autoregressive model to belong to a codebook trained on clean speech. We then extend this formulation to the design of a novel framework, called the multiple input Kalman filter, that optimally combines the outputs from several speech enhancement systems. Since the low bit-rate speech coders compress the parameters significantly, it is very important to protect the transmitted information from errors in the communication channel. In this thesis, a novel channel-optimized multi-stage vector quantization codec is presented, in which the stage codebooks are jointly designed.
7

Methods for Increasing Robustness of Deep Convolutional Neural Networks

Uličný, Matej January 2015 (has links)
Recent discoveries uncovered flaws in machine learning algorithms such as deep neural networks. Deep neural networks seem vulnerable to small amounts of non-random noise, created by exploiting the input to output mapping of the network. Applying this noise to an input image drastically decreases classication performance. Such image is referred to as an adversarial example. The purpose of this thesis is to examine how known regularization/robustness methods perform on adversarial examples. The robustness methods: dropout, low-pass filtering, denoising autoencoder, adversarial training and committees have been implemented, combined and tested. For the well-known benchmark, the MNIST (Mixed National Institute of Standards and Technology) dataset, the best combination of robustness methods has been found. Emerged from the results of the experiments, ensemble of models trained on adversarial examples is considered to be the best approach for MNIST. Harmfulness of the adversarial noise and some robustness experiments are demonstrated on CIFAR10 (The Canadian Institute for Advanced Research) dataset as well. Apart from robustness tests, the thesis describes experiments with human classification performance on noisy images and the comparison with performance of deep neural network.
8

Physiologically Motivated Methods For Audio Pattern Classification

Ravindran, Sourabh 20 November 2006 (has links)
Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring
9

Noise Robustness of Convolutional Autoencoders and Neural Networks for LPI Radar Classification / Brustålighet hos faltningsbaserade neurala nätverk för klassificering av LPI radar

Norén, Gustav January 2020 (has links)
This study evaluates noise robustness of convolutional autoencoders and neural networks for classification of Low Probability of Intercept (LPI) radar modulation type. Specifically, a number of different neural network architectures are tested in four different synthetic noise environments. Tests in Gaussian noise show that performance is decreasing with decreasing Signal to Noise Ratio (SNR). Training a network on all SNRs in the dataset achieved a peak performance of 70.8 % at SNR=-6 dB with a denoising autoencoder and convolutional classifier setup. Tests indicate that the models have a difficult time generalizing to SNRs lower than what is provided in training data, performing roughly 10-20% worse than when those SNRs are included in the training data. If intermediate SNRs are removed from the training data the models can generalize and perform similarly to tests where, intermediate noise levels are included in the training data. When testing data is generated with different parameters to training data performance is underwhelming, with a peak performance of 22.0 % at SNR=-6 dB. The last tests done use telecom signals as additive noise instead of Gaussian noise. These tests are performed when the LPI and telecom signals appear at different frequencies. The models preform well on such cases with a peak performance of 80.3 % at an intermidiate noise level. This study also contribute with a different, and more realistic, way of generating data than what is prevalent in literature as well as a network that performs well without the need for signal preprocessing. Without preprocessing a peak performance of 64.9 % was achieved at SNR=-6 dB. It is customary to generate data such that each sample always includes the start of its signals period which increases performance by around 20 % across all tests. In a real application however it is not certain that the start of a received signal can be determined. / Detta arbete studerar brustålighet hos neurala nätverk för klassificering av \textit{låg sannolikhet för avlyssning} (LPI) radars modulationstyp. Specifikt testas ett antal arkitekturer baserade på faltningsnätverk och evalueras i fyra olika syntetiska brusmiljöer. Tester genomförda på data med Gaussiskt brus visar att klasificeringsfelet är ökande med ett minskande signal-till-brusförhållande. Om man låter nätverken träna på alla brusnivåer som ingår i datan uppnås en högsta pricksäkerhet om 70.8 % vid ett signal-till-brusförhållande på -6 dB. Vidare tester tyder på att nätverken presterar sämre på låga signal-till-brusförhållanden om de inte finns med i träningsdata och ger i allmänhet mellan 10-20 % sämre pricksäkerhet. Om de mellersta brusnivåerna inte finns med i träningsdata presterar nätverken lika bra som när de finns med i träningsdata. Om träningsdata och testdata genereras med olika parameterar presterar nätverken dåligt. För dessa tester uppnås en högsta pricksäkerhet om 22.0 % vid ett signal-till-brusförhållande på -6 dB. Den sista brusmiljön som testades på använder sig av telekom signaler som om de vore brus istället för Gaussiskt brus. I detta fall är LPI och telekom signalerna väl skiljda i frekvens och nätverken presterar lika bra som tester i Gaussiskt brus med högt signal-till-brusförhållande. Högsta pricksäkerhet som uppnåts på dessa tester är 80.3 % i mellanhög brusnivå. Detta arbete bidrar även med nätverk som presterar bra utan att data behöver signalbehandlas innnan den kan klassificeras samt genererar data på ett mer realistiskt vis än tidigare litteratur inom detta område. Utan att signalbehandla data uppnåddes en högsta pricksäkerhet om 64.9 % vid ett signal-till-brusförhållande på -6 dB. Den mer realistiska datan genereras så att dess startpunkt är slumpmässig. I litteraturen brukar startpunkten inkluderas och uppnår på så vis överlag pricksäkerheter som är ungefär 20 % högre än de tester som genomförs i detta arbete. I verkliga applikationer är det sällan man kan identifera en signals startpunkt med säkerhet.
10

Traitement de l'incertitude pour la reconnaissance de la parole robuste au bruit / Uncertainty learning for noise robust ASR

Tran, Dung Tien 20 November 2015 (has links)
Cette thèse se focalise sur la reconnaissance automatique de la parole (RAP) robuste au bruit. Elle comporte deux parties. Premièrement, nous nous focalisons sur une meilleure prise en compte des incertitudes pour améliorer la performance de RAP en environnement bruité. Deuxièmement, nous présentons une méthode pour accélérer l'apprentissage d'un réseau de neurones en utilisant une fonction auxiliaire. Dans la première partie, une technique de rehaussement multicanal est appliquée à la parole bruitée en entrée. La distribution a posteriori de la parole propre sous-jacente est alors estimée et représentée par sa moyenne et sa matrice de covariance, ou incertitude. Nous montrons comment propager la matrice de covariance diagonale de l'incertitude dans le domaine spectral à travers le calcul des descripteurs pour obtenir la matrice de covariance pleine de l'incertitude sur les descripteurs. Le décodage incertain exploite cette distribution a posteriori pour modifier dynamiquement les paramètres du modèle acoustique au décodage. La règle de décodage consiste simplement à ajouter la matrice de covariance de l'incertitude à la variance de chaque gaussienne. Nous proposons ensuite deux estimateurs d'incertitude basés respectivement sur la fusion et sur l'estimation non-paramétrique. Pour construire un nouvel estimateur, nous considérons la combinaison linéaire d'estimateurs existants ou de fonctions noyaux. Les poids de combinaison sont estimés de façon générative en minimisant une mesure de divergence par rapport à l'incertitude oracle. Les mesures de divergence utilisées sont des versions pondérées des divergences de Kullback-Leibler (KL), d'Itakura-Saito (IS) ou euclidienne (EU). En raison de la positivité inhérente de l'incertitude, ce problème d'estimation peut être vu comme une instance de factorisation matricielle positive (NMF) pondérée. De plus, nous proposons deux estimateurs d'incertitude discriminants basés sur une transformation linéaire ou non linéaire de l'incertitude estimée de façon générative. Cette transformation est entraînée de sorte à maximiser le critère de maximum d'information mutuelle boosté (bMMI). Nous calculons la dérivée de ce critère en utilisant la règle de dérivation en chaîne et nous l'optimisons par descente de gradient stochastique. Dans la seconde partie, nous introduisons une nouvelle méthode d'apprentissage pour les réseaux de neurones basée sur une fonction auxiliaire sans aucun réglage de paramètre. Au lieu de maximiser la fonction objectif, cette technique consiste à maximiser une fonction auxiliaire qui est introduite de façon récursive couche par couche et dont le minimum a une expression analytique. Grâce aux propriétés de cette fonction, la décroissance monotone de la fonction objectif est garantie / This thesis focuses on noise robust automatic speech recognition (ASR). It includes two parts. First, we focus on better handling of uncertainty to improve the performance of ASR in a noisy environment. Second, we present a method to accelerate the training process of a neural network using an auxiliary function technique. In the first part, multichannel speech enhancement is applied to input noisy speech. The posterior distribution of the underlying clean speech is then estimated, as represented by its mean and its covariance matrix or uncertainty. We show how to propagate the diagonal uncertainty covariance matrix in the spectral domain through the feature computation stage to obtain the full uncertainty covariance matrix in the feature domain. Uncertainty decoding exploits this posterior distribution to dynamically modify the acoustic model parameters in the decoding rule. The uncertainty decoding rule simply consists of adding the uncertainty covariance matrix of the enhanced features to the variance of each Gaussian component. We then propose two uncertainty estimators based on fusion to nonparametric estimation, respectively. To build a new estimator, we consider a linear combination of existing uncertainty estimators or kernel functions. The combination weights are generatively estimated by minimizing some divergence with respect to the oracle uncertainty. The divergence measures used are weighted versions of Kullback-Leibler (KL), Itakura-Saito (IS), and Euclidean (EU) divergences. Due to the inherent nonnegativity of uncertainty, this estimation problem can be seen as an instance of weighted nonnegative matrix factorization (NMF). In addition, we propose two discriminative uncertainty estimators based on linear or nonlinear mapping of the generatively estimated uncertainty. This mapping is trained so as to maximize the boosted maximum mutual information (bMMI) criterion. We compute the derivative of this criterion using the chain rule and optimize it using stochastic gradient descent. In the second part, we introduce a new learning rule for neural networks that is based on an auxiliary function technique without parameter tuning. Instead of minimizing the objective function, this technique consists of minimizing a quadratic auxiliary function which is recursively introduced layer by layer and which has a closed-form optimum. Based on the properties of this auxiliary function, the monotonic decrease of the new learning rule is guaranteed.

Page generated in 0.11 seconds