Global ETD Search

1	Gaussian Mixture Model-based Feature Compensation with Application to Noise-robust Speech Recognition Yeh, Bing-Feng 28 August 2012 (has links) In this paper, we propose a new method for noise robustness base on Gaussian Mixture Model (GMM), and the method we proposed can estimate the noise feature effectively and reduce noise effect by plain fashion, and we can retain the smoothing and continuity from original feature in this way. Compared to the traditional feature transformation method MMSE(Minimum Mean Square Error) which want to find a clean one, the different is that the method we proposed only need to fine noise feature or the margin of noise effect and subtract the noise to achieve more robustness effect than traditional methods. In the experiment method, the test data pass through the trained noise classifier to judge the noise type and SNR, and according to the result of classifier to choose the corresponding transformation model and generate the noise feature by this model, and then we can use different weight linear combination to generate noise feature, and finally apply simple subtraction to achieve noise reduction. In the experiment, we use AURORA 2.0 corpus to estimate noise robustness performance, and using traditional method can achieve 36:8% relative improvement than default, and the our method can achieve 52:5% relative improvement, and compared to the traditional method our method can attain 24:9% relative improvement. MMSE GMM Noise Robustness
2	Statistical models for noise-robust speech recognition van Dalen, Rogier Christiaan January 2011 (has links) A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make. 620 Speech recognition ; Noise-robustness
3	Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition Wu, Kuo-hao 25 August 2010 (has links) In this thesis, a novel technique based on the empirical mode decomposition (EMD) methodology is proposed and examined for the noise-robustness of automatic speech recognition systems. The EMD analysis is a generalization of the Fourier analysis for processing nonlinear and non-stationary time functions, in our case, the speech feature sequences. We use the intrinsic mode functions (IMF), which include the sinusoidal functions as special cases, obtained from the EMD analysis in the post-processing of the log energy feature. We evaluate the proposed method on Aurora 2.0 and Aurora 3.0 databases. On Aurora 2.0, we obtain a 44.9% overall relative improvement over the baseline for the mismatched (clean-training) tasks. The results show an overall improvement of 49.5% over the baseline for Aurora 3.0 on the high-mismatch tasks. It shows that our proposed method leads to significant improvement. noise robustness empirical mode decomposition speech recognition
4	Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments / Parameteruttrekning for automatisk talegjenkjenning i støyende omgivelser Gajic, Bojana January 2002 (has links) <p>This thesis presents a study of alternative speech feature extraction methods aimed at increasing robustness of automatic speech recognition (ASR) against additive background noise. </p><p>Spectral peak positions of speech signals remain practically unchanged in presence of additive background noise. Thus, it was expected that emphasizing spectral peak positions in speech feature extraction would result in improved noise robustness of ASR systems. If frequency subbands are properly chosen, dominant subband frequencies can serve as reasonable estimates of spectral peak positions. Thus, different methods for incorporating dominant subband frequencies into speech feature vectors were investigated in this study.</p><p>To begin with, two earlier proposed feature extraction methods that utilize dominant subband frequency information were examined. The first one uses zero-crossing statistics of the subband signals to estimate dominant subband frequencies, while the second one uses subband spectral centroids. The methods were compared with the standard MFCC feature extraction method on two different recognition tasks in various background conditions. The first method was shown to improve ASR performance on both recognition tasks at sufficiently high noise levels. The improvement was, however, smaller on the more complex recognition task. The second method, on the other hand, led to some reduction in ASR performance in all testing conditions.</p><p>Next, a new method for incorporating subband spectral centroids into speech feature vectors was proposed, and was shown to be considerably more robust than the standard MFCC method on both ASR tasks. The main difference between the proposed method and the zero-crossing based method is in the way they utilize dominant subband frequency information. It was shown that the performance improvement due to the use of dominant subband frequency information was considerably larger for the proposed method than for the ZCPA method, especially on the more complex recognition task. Finally, the computational complexity of the proposed method is two orders of magnitude lower than that of the zero-crossing based method, and of the same order of magnitude as the standard MFCC method.</p> speech recognition feature extraction noise robustness Teleteknikk signalbehandling talegjenkjenning
5	Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments / Parameteruttrekning for automatisk talegjenkjenning i støyende omgivelser Gajic, Bojana January 2002 (has links) This thesis presents a study of alternative speech feature extraction methods aimed at increasing robustness of automatic speech recognition (ASR) against additive background noise. Spectral peak positions of speech signals remain practically unchanged in presence of additive background noise. Thus, it was expected that emphasizing spectral peak positions in speech feature extraction would result in improved noise robustness of ASR systems. If frequency subbands are properly chosen, dominant subband frequencies can serve as reasonable estimates of spectral peak positions. Thus, different methods for incorporating dominant subband frequencies into speech feature vectors were investigated in this study. To begin with, two earlier proposed feature extraction methods that utilize dominant subband frequency information were examined. The first one uses zero-crossing statistics of the subband signals to estimate dominant subband frequencies, while the second one uses subband spectral centroids. The methods were compared with the standard MFCC feature extraction method on two different recognition tasks in various background conditions. The first method was shown to improve ASR performance on both recognition tasks at sufficiently high noise levels. The improvement was, however, smaller on the more complex recognition task. The second method, on the other hand, led to some reduction in ASR performance in all testing conditions. Next, a new method for incorporating subband spectral centroids into speech feature vectors was proposed, and was shown to be considerably more robust than the standard MFCC method on both ASR tasks. The main difference between the proposed method and the zero-crossing based method is in the way they utilize dominant subband frequency information. It was shown that the performance improvement due to the use of dominant subband frequency information was considerably larger for the proposed method than for the ZCPA method, especially on the more complex recognition task. Finally, the computational complexity of the proposed method is two orders of magnitude lower than that of the zero-crossing based method, and of the same order of magnitude as the standard MFCC method. speech recognition feature extraction noise robustness Teleteknikk signalbehandling talegjenkjenning
6	A framework for low bit-rate speech coding in noisy environment Krishnan, Venkatesh 21 April 2005 (has links) State of the art model based coders offer a perceptually acceptable reconstructed speech quality at bit-rates as low as 2000 bits per second. However, the performance of these coders rapidly deteriorates below this rate, primarily since very few bits are available to encode the model parameters with high fidelity. This thesis aims to meet the challenge of designing speech coders that operate at lower bit-rates while reconstructing the speech at the receiver at the same or even better quality than state of the art low bit-rate speech coders. In one of the contributions, we develop a plethora of techniques for efficient coding of the parameters obtained by the MELP algorithm, under the assumption that the classification of the frames of the MELP coder is available. Also, a simple and elegant procedure called dynamic codebook reordering is presented for use in the encoders and decoders of a vector quantization system that effectively exploits the correlation between vectors of parameters obtained from consecutiv speech frames without introducing any delay, distortion or suboptimality. The potential of this technique in significantly reducing the bit-rates of speech coders is illustrated. Additionally, the thesis also attempts to address the issues of designing such very low bit-rate speech coders so that they are robust to environmental noise. To impart robustness, a speech enhancement framework employing Kalman filters is presented. Kalman filters designed for speech enhancement in the presence of noise assume an autoregressive model for the speech signal. We improve the performance of Kalman filters in speech enhancement by constraining the parameters of the autoregressive model to belong to a codebook trained on clean speech. We then extend this formulation to the design of a novel framework, called the multiple input Kalman filter, that optimally combines the outputs from several speech enhancement systems. Since the low bit-rate speech coders compress the parameters significantly, it is very important to protect the transmitted information from errors in the communication channel. In this thesis, a novel channel-optimized multi-stage vector quantization codec is presented, in which the stage codebooks are jointly designed. Noise robustness Low bit-rate Digital signal processing Speech coding
7	Methods for Increasing Robustness of Deep Convolutional Neural Networks Uličný, Matej January 2015 (has links) Recent discoveries uncovered flaws in machine learning algorithms such as deep neural networks. Deep neural networks seem vulnerable to small amounts of non-random noise, created by exploiting the input to output mapping of the network. Applying this noise to an input image drastically decreases classication performance. Such image is referred to as an adversarial example. The purpose of this thesis is to examine how known regularization/robustness methods perform on adversarial examples. The robustness methods: dropout, low-pass filtering, denoising autoencoder, adversarial training and committees have been implemented, combined and tested. For the well-known benchmark, the MNIST (Mixed National Institute of Standards and Technology) dataset, the best combination of robustness methods has been found. Emerged from the results of the experiments, ensemble of models trained on adversarial examples is considered to be the best approach for MNIST. Harmfulness of the adversarial noise and some robustness experiments are demonstrated on CIFAR10 (The Canadian Institute for Advanced Research) dataset as well. Apart from robustness tests, the thesis describes experiments with human classification performance on noisy images and the comparison with performance of deep neural network. adversarial examples deep neural network noise robustness Computer Sciences Datavetenskap (datalogi)
8	Physiologically Motivated Methods For Audio Pattern Classification Ravindran, Sourabh 20 November 2006 (has links) Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring Gain adaptation Speech processing Auditory modeling Machine learning Noise robustness Pattern recognition systems Automatic speech recognition Machine learning
9	Noise Robustness of Convolutional Autoencoders and Neural Networks for LPI Radar Classification / Brustålighet hos faltningsbaserade neurala nätverk för klassificering av LPI radar Norén, Gustav January 2020 (has links) This study evaluates noise robustness of convolutional autoencoders and neural networks for classification of Low Probability of Intercept (LPI) radar modulation type. Specifically, a number of different neural network architectures are tested in four different synthetic noise environments. Tests in Gaussian noise show that performance is decreasing with decreasing Signal to Noise Ratio (SNR). Training a network on all SNRs in the dataset achieved a peak performance of 70.8 % at SNR=-6 dB with a denoising autoencoder and convolutional classifier setup. Tests indicate that the models have a difficult time generalizing to SNRs lower than what is provided in training data, performing roughly 10-20% worse than when those SNRs are included in the training data. If intermediate SNRs are removed from the training data the models can generalize and perform similarly to tests where, intermediate noise levels are included in the training data. When testing data is generated with different parameters to training data performance is underwhelming, with a peak performance of 22.0 % at SNR=-6 dB. The last tests done use telecom signals as additive noise instead of Gaussian noise. These tests are performed when the LPI and telecom signals appear at different frequencies. The models preform well on such cases with a peak performance of 80.3 % at an intermidiate noise level. This study also contribute with a different, and more realistic, way of generating data than what is prevalent in literature as well as a network that performs well without the need for signal preprocessing. Without preprocessing a peak performance of 64.9 % was achieved at SNR=-6 dB. It is customary to generate data such that each sample always includes the start of its signals period which increases performance by around 20 % across all tests. In a real application however it is not certain that the start of a received signal can be determined. / Detta arbete studerar brustålighet hos neurala nätverk för klassificering av \textit{låg sannolikhet för avlyssning} (LPI) radars modulationstyp. Specifikt testas ett antal arkitekturer baserade på faltningsnätverk och evalueras i fyra olika syntetiska brusmiljöer. Tester genomförda på data med Gaussiskt brus visar att klasificeringsfelet är ökande med ett minskande signal-till-brusförhållande. Om man låter nätverken träna på alla brusnivåer som ingår i datan uppnås en högsta pricksäkerhet om 70.8 % vid ett signal-till-brusförhållande på -6 dB. Vidare tester tyder på att nätverken presterar sämre på låga signal-till-brusförhållanden om de inte finns med i träningsdata och ger i allmänhet mellan 10-20 % sämre pricksäkerhet. Om de mellersta brusnivåerna inte finns med i träningsdata presterar nätverken lika bra som när de finns med i träningsdata. Om träningsdata och testdata genereras med olika parameterar presterar nätverken dåligt. För dessa tester uppnås en högsta pricksäkerhet om 22.0 % vid ett signal-till-brusförhållande på -6 dB. Den sista brusmiljön som testades på använder sig av telekom signaler som om de vore brus istället för Gaussiskt brus. I detta fall är LPI och telekom signalerna väl skiljda i frekvens och nätverken presterar lika bra som tester i Gaussiskt brus med högt signal-till-brusförhållande. Högsta pricksäkerhet som uppnåts på dessa tester är 80.3 % i mellanhög brusnivå. Detta arbete bidrar även med nätverk som presterar bra utan att data behöver signalbehandlas innnan den kan klassificeras samt genererar data på ett mer realistiskt vis än tidigare litteratur inom detta område. Utan att signalbehandla data uppnåddes en högsta pricksäkerhet om 64.9 % vid ett signal-till-brusförhållande på -6 dB. Den mer realistiska datan genereras så att dess startpunkt är slumpmässig. I litteraturen brukar startpunkten inkluderas och uppnår på så vis överlag pricksäkerheter som är ungefär 20 % högre än de tester som genomförs i detta arbete. I verkliga applikationer är det sällan man kan identifera en signals startpunkt med säkerhet. LPI radar CNN autoencoder noise robustness denoising LPI radar CNN autoencoder brustålighet avbrusning Probability Theory and Statistics Sannolikhetsteori och statistik
10	TOWARDS EFFICIENT AND ROBUST DEEP LEARNING :HANDLING DATA NON-IDEALITY AND LEVERAGINGIN-MEMORY COMPUTING Sangamesh D Kodge (19958580) 05 November 2024 (has links) <p dir="ltr">Deep learning has achieved remarkable success across various domains, largely relyingon assumptions of ideal data conditions—such as balanced distributions, accurate labeling,and sufficient computational resources—that rarely hold in real-world applications. Thisthesis addresses the significant challenges posed by data non-idealities, including privacyconcerns, label noise, non-IID (Independent and Identically Distributed) data, and adversarial threats, which can compromise model performance and security. Additionally, weexplore the computational limitations inherent in traditional architectures by introducingin-memory computing techniques to mitigate the memory bottleneck in deep neural networkimplementations.We propose five novel contributions to tackle these challenges and enhance the efficiencyand robustness of deep learning models. First, we introduce a gradient-free machine unlearning algorithm to ensure data privacy by effectively forgetting specific classes withoutretraining. Second, we propose a corrective machine unlearning technique, SAP, that improves robustness against label noise using Scaled Activation Projections. Third, we presentthe Neighborhood Gradient Mean (NGM) method, a decentralized learning approach thatoptimizes performance on non-IID data with minimal computational overhead. Fourth, wedevelop TREND, an ensemble design strategy that leverages transferability metrics to enhance adversarial robustness. Finally, we explore an in-memory computing solution, IMAC,that enables energy-efficient and low-latency multiplication and accumulation operationsdirectly within 6T SRAM arrays.These contributions collectively advance the state-of-the-art in handling data non-idealitiesand computational efficiency in deep learning, providing robust, scalable, and privacypreserving solutions suitable for real-world deployment across diverse environments.</p> Deep learning Data Non-Idealities Machine Unlearning Label Noise Robustness Adversarial Robustness In-Memory Compute

Search results