Global ETD Search

51	Nonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And Nonuniformly-Sampled Signals Sreenivasa Murthy, A January 2012 (has links) (PDF) For time-varying signals such as speech and audio, short-time analysis becomes necessary to compute specific signal attributes and to keep track of their evolution. The standard technique is the short-time Fourier transform (STFT), using which one decomposes a signal in terms of windowed Fourier bases. An advancement over STFT is the wavelet analysis in which a function is represented in terms of shifted and dilated versions of a localized function called the wavelet. A specific modeling approach particularly in the context of speech is based on short-time linear prediction or short-time Wiener filtering of noisy speech. In most nonstationary signal processing formalisms, the key idea is to analyze the properties of the signal locally, either by first truncating the signal and then performing a basis expansion (as in the case of STFT), or by choosing compactly-supported basis functions (as in the case of wavelets). We retain the same motivation as these approaches, but use polynomials to model the signal on a short-time basis (“short-time polynomial representation”). To emphasize the local nature of the modeling aspect, we refer to it as “local polynomial modeling (LPM).” We pursue two main threads of research in this thesis: (i) Short-time approaches for speech enhancement; and (ii) LPM for enhancing smooth signals, with applications to ECG, noisy nonuniformly-sampled signals, and voiced/unvoiced segmentation in noisy speech. Improved iterative Wiener filtering for speech enhancement A constrained iterative Wiener filter solution for speech enhancement was proposed by Hansen and Clements. Sreenivas and Kirnapure improved the performance of the technique by imposing codebook-based constraints in the process of parameter estimation. The key advantage is that the optimal parameter search space is confined to the codebook. The Nonstationary signal enhancement solutions assume stationary noise. However, in practical applications, noise is not stationary and hence updating the noise statistics becomes necessary. We present a new approach to perform reliable noise estimation based on spectral subtraction. We first estimate the signal spectrum and perform signal subtraction to estimate the noise power spectral density. We further smooth the estimated noise spectrum to ensure reliability. The key contributions are: (i) Adaptation of the technique for non-stationary noises; (ii) A new initialization procedure for faster convergence and higher accuracy; (iii) Experimental determination of the optimal LP-parameter space; and (iv) Objective criteria and speech recognition tests for performance comparison. Optimal local polynomial modeling and applications We next address the problem of fitting a piecewise-polynomial model to a smooth signal corrupted by additive noise. Since the signal is smooth, it can be represented using low-order polynomial functions provided that they are locally adapted to the signal. We choose the mean-square error as the criterion of optimality. Since the model is local, it preserves the temporal structure of the signal and can also handle nonstationary noise. We show that there is a trade-off between the adaptability of the model to local signal variations and robustness to noise (bias-variance trade-off), which we solve using a stochastic optimization technique known as the intersection of confidence intervals (ICI) technique. The key trade-off parameter is the duration of the window over which the optimum LPM is computed. Within the LPM framework, we address three problems: (i) Signal reconstruction from noisy uniform samples; (ii) Signal reconstruction from noisy nonuniform samples; and (iii) Classification of speech signals into voiced and unvoiced segments. The generic signal model is x(tn)=s(tn)+d(tn),0 ≤ n ≤ N - 1. In problems (i) and (iii) above, tn=nT(uniform sampling); in (ii) the samples are taken at nonuniform instants. The signal s(t)is assumed to be smooth; i.e., it should admit a local polynomial representation. The problem in (i) and (ii) is to estimate s(t)from x(tn); i.e., we are interested in optimal signal reconstruction on a continuous domain starting from uniform or nonuniform samples. We show that, in both cases, the bias and variance take the general form: The mean square error (MSE) is given by where L is the length of the window over which the polynomial fitting is performed, f is a function of s(t), which typically comprises the higher-order derivatives of s(t), the order itself dependent on the order of the polynomial, and g is a function of the noise variance. It is clear that the bias and variance have complementary characteristics with respect to L. Directly optimizing for the MSE would give a value of L, which involves the functions f and g. The function g may be estimated, but f is not known since s(t)is unknown. Hence, it is not practical to compute the minimum MSE (MMSE) solution. Therefore, we obtain an approximate result by solving the bias-variance trade-off in a probabilistic sense using the ICI technique. We also propose a new approach to optimally select the ICI technique parameters, based on a new cost function that is the sum of the probability of false alarm and the area covered over the confidence interval. In addition, we address issues related to optimal model-order selection, search space for window lengths, accuracy of noise estimation, etc. The next issue addressed is that of voiced/unvoiced segmentation of speech signal. Speech segments show different spectral and temporal characteristics based on whether the segment is voiced or unvoiced. Most speech processing techniques process the two segments differently. The challenge lies in making detection techniques offer robust performance in the presence of noise. We propose a new technique for voiced/unvoiced clas-sification by taking into account the fact that voiced segments have a certain degree of regularity, and that the unvoiced segments do not possess any smoothness. In order to capture the regularity in voiced regions, we employ the LPM. The key idea is that regions where the LPM is inaccurate are more likely to be unvoiced than voiced. Within this frame-work, we formulate a hypothesis testing problem based on the accuracy of the LPM fit and devise a test statistic for performing V/UV classification. Since the technique is based on LPM, it is capable of adapting to nonstationary noises. We present Monte Carlo results to demonstrate the accuracy of the proposed technique. Signal Processing Local Polynomial Modeling Iterative Wiener Filtering Speech Enhancement Speech Signals ECG Signals Time-varying Signals Signal Enhancement Speech Processing Noisy Non-Stationary Signals Noisy Speech Local Polynomial Model (LPM) Communication Engineering
52	Zvyšování robustnosti systémů pro rozpoznávání mluvčích pomocí diskriminativních technik / Improving Robustness of Speaker Recognition using Discriminative Techniques Novotný, Ondřej January 2021 (has links) Tato práce pojednává o využití diskriminativních technik v oblasti rozpoznávání mluvčích za účelem získání větší robustnosti těchto systémů vůči vlivům negativně ovlivňující jejich výkonnost. Mezi tyto vlivy řadíme šum, reverberaci nebo přenosový kanál. Práce je rozdělena do dvou hlavních částí. V první části se věnujeme teoretickému úvodu do problematiky rozpoznávání mluvčích. Popsány jsou jednotlivé kroky rozpoznávacího systému od extrakce akustických příznaků, extrakce vektorových reprezentací nahrávek, až po tvorbu finálního rozpoznávacího skóre. Zvláštní důraz je věnován technikám extrakce vektorové reprezentace nahrávky, kdy popisujeme dvě rozdílná paradigmata možného přístupu, i-vektory a x-vektory. Druhá část práce se již více věnuje diskriminativním technikám pro zvýšení robustnosti. Techniky jsou organizovány tak, aby odpovídaly postupnému průchodu nahrávky rozpoznávacím systémem. Nejdříve je věnována pozornost předzpracování signálu pomocí neuronové sítě pro odšumění a obohacení signálu řeči jako univerzální technice, která je nezávislá na následně použitém rozpoznávacím systému. Dále se zameřujeme na využití diskriminativního přístupu při extrakci příznaků a extrakci vektorových reprezentací nahrávek. Práce rovněž pokrývá přechod od generativního paradigmatu k plně diskriminativnímu přístupu v systémech pro rozpoznávání mluvčích. Veškeré techniky jsou následně vždy experimentálně ověřeny a zhodnocen jejich přínos. V práci je navrženo několik přístupů, které se osvědčily jak u generativního přístupu v podobě i-vektorů, tak i u diskriminativních x-vektorů, a díky nim bylo dosaženo významného zlepšení. Pro úplnost jsou, v oblasti problematiky robustnosti, do práce zařazeny i další techniky, jako je normalizace skóre, či více-scénářové trénování systémů. Závěrem se práce zabývá problematikou robustnosti diskriminativních systému z pohledu dat využitých při jejich trénování.
53	Computational auditory scene analysis and robust automatic speech recognition Narayanan, Arun 14 November 2014 (has links) No description available. Engineering Computer Science Automatic speech recognition noise robustness computational auditory scene analysis binary masking ratio masking mask estimation deep neural networks acoustic modeling speech separation speech enhancement noisy ASR CHiME-2 Aurora-4
54	Ενίσχυση σημάτων μουσικής υπό το περιβάλλον θορύβου Παπανικολάου, Παναγιώτης 20 October 2010 (has links) Στην παρούσα εργασία επιχειρείται η εφαρμογή αλγορίθμων αποθορυβοποίησης σε σήματα μουσικής και η εξαγωγή συμπερασμάτων σχετικά με την απόδοση αυτών ανά μουσικό είδος. Η κύρια επιδίωξη είναι να αποσαφηνιστούν τα βασικά προβλήματα της ενίσχυσης ήχων και να παρουσιαστούν οι διάφοροι αλγόριθμοι που έχουν αναπτυχθεί για την επίλυση των προβλημάτων αυτών. Αρχικά γίνεται μία σύντομη εισαγωγή στις βασικές έννοιες πάνω στις οποίες δομείται η τεχνολογία ενίσχυσης ομιλίας. Στην συνέχεια εξετάζονται και αναλύονται αντιπροσωπευτικοί αλγόριθμοι από κάθε κατηγορία τεχνικών αποθορυβοποίησης, την κατηγορία φασματικής αφαίρεσης, την κατηγορία στατιστικών μοντέλων και αυτήν του υποχώρου. Για να μπορέσουμε να αξιολογήσουμε την απόδοση των παραπάνω αλγορίθμων χρησιμοποιούμε αντικειμενικές μετρήσεις ποιότητας, τα αποτελέσματα των οποίων μας δίνουν την δυνατότητα να συγκρίνουμε την απόδοση του κάθε αλγορίθμου. Με την χρήση τεσσάρων διαφορετικών μεθόδων αντικειμενικών μετρήσεων διεξάγουμε τα πειράματα εξάγοντας μια σειρά ενδεικτικών τιμών που μας δίνουν την ευχέρεια να συγκρίνουμε είτε τυχόν διαφοροποιήσεις στην απόδοση των αλγορίθμων της ίδιας κατηγορίας είτε διαφοροποιήσεις στο σύνολο των αλγορίθμων. Από την σύγκριση αυτή γίνεται εξαγωγή χρήσιμων συμπερασμάτων σχετικά με τον προσδιορισμό των παραμέτρων κάθε αλγορίθμου αλλά και με την καταλληλότητα του κάθε αλγορίθμου για συγκεκριμένες συνθήκες θορύβου και για συγκεκριμένο μουσικό είδος. / This thesis attempts to apply Noise Reduction algorithms to signals of music and draw conclusions concerning the performance of each algorithm for every musical genre. The main aims are to clarify the basic problems of sound enhancement and present the various algorithms developed for solving these problems. After a brief introduction to basic concepts on sound enhancement we examine and analyze various algorithms that have been proposed at times in the literature for speech enhancement. These algorithms can be divided into three main classes: spectral subtractive algorithms, statistical-model-based algorithms and subspace algorithms. In order to evaluate the performance of the above algorithms we use objective measures of quality, the results of which give us the opportunity to compare the performance of each algorithm. By using four different methods of objective measures to conduct the experiments we draw a set of values that facilitate us to make within-class algorithm comparisons and across-class algorithm comparisons. From these comparisons we can draw conclusions on the determination of parameters for each algorithm and the appropriateness of algorithms for specific noise conditions and music genre. Φασματική αφαίρεση Φιλτράρισμα Wiener Αλγόριθμοι υποχώρου 621.382 24 Speech enhancement Music signals enhancement Denoising techniques Spectral subtraction Wiener filtering Wavelet thresholding Psychoacoustic constraints Statistical model based methods Minimum mean square error (MMSE) Maximum likelihood estimators Subspace algorithms
55	Fusion pour la séparation de sources audio / Fusion for audio source separation Jaureguiberry, Xabier 16 June 2015 (has links) La séparation aveugle de sources audio dans le cas sous-déterminé est un problème mathématique complexe dont il est aujourd'hui possible d'obtenir une solution satisfaisante, à condition de sélectionner la méthode la plus adaptée au problème posé et de savoir paramétrer celle-ci soigneusement. Afin d'automatiser cette étape de sélection déterminante, nous proposons dans cette thèse de recourir au principe de fusion. L'idée est simple : il s'agit, pour un problème donné, de sélectionner plusieurs méthodes de résolution plutôt qu'une seule et de les combiner afin d'en améliorer la solution. Pour cela, nous introduisons un cadre général de fusion qui consiste à formuler l'estimée d'une source comme la combinaison de plusieurs estimées de cette même source données par différents algorithmes de séparation, chaque estimée étant pondérée par un coefficient de fusion. Ces coefficients peuvent notamment être appris sur un ensemble d'apprentissage représentatif du problème posé par minimisation d'une fonction de coût liée à l'objectif de séparation. Pour aller plus loin, nous proposons également deux approches permettant d'adapter les coefficients de fusion au signal à séparer. La première formule la fusion dans un cadre bayésien, à la manière du moyennage bayésien de modèles. La deuxième exploite les réseaux de neurones profonds afin de déterminer des coefficients de fusion variant en temps. Toutes ces approches ont été évaluées sur deux corpus distincts : l'un dédié au rehaussement de la parole, l'autre dédié à l'extraction de voix chantée. Quelle que soit l'approche considérée, nos résultats montrent l'intérêt systématique de la fusion par rapport à la simple sélection, la fusion adaptative par réseau de neurones se révélant être la plus performante. / Underdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks. Sélection de modèles Combinaison de modèles Séparation de sources audio Rehaussement de la parole Factorisation en matrices non-négatives Inférence variationnelle bayésienne Moyennage bayésien de modèles Réseaux de neurones profonds Model selection Model combination Audio source separation Speech enhancement Non-negative matrix factorization (NMF) Variational Bayesian inference Bayesian model averaging Deep neural networks
56	Evaluation of Methods for Sound Source Separation in Audio Recordings Using Machine Learning Gidlöf, Amanda January 2023 (has links) Sound source separation is a popular and active research area, especially with modern machine learning techniques. In this thesis, the focus is on single-channel separation of two speakers into individual streams, and specifically considering the case where two speakers are also accompanied by background noise. There are different methods to separate speakers and in this thesis three different methods are evaluated: the Conv-TasNet, the DPTNet, and the FaSNetTAC. The methods were used to train models to perform the sound source separation. These models were evaluated and validated through three experiments. Firstly, previous results for the chosen separation methods were reproduced. Secondly, appropriate models applicable for NFC's datasets and applications were created, to fulfill the aim of this thesis. Lastly, all models were evaluated on an independent dataset, similar to datasets from NFC. The results were evaluated using the metrics SI-SNRi and SDRi. This thesis provides recommended models and methods suitable for NFC applications, especially concluding that the Conv-TasNet and the DPTNet are reasonable choices. Sound source separation signal processing audio processing speech enhancement speaker identification speech separation speech processing speaker separation single channel multi channel cocktail party problem machine learning neural network deep learning deep neural network convolutional network dual-path network filter-and-sum network recurrent network beamforming Communication Systems Kommunikationssystem
57	Improving Speech Intelligibility Without Sacrificing Environmental Sound Recognition Johnson, Eric Martin 27 September 2022 (has links) No description available. Acoustics Audiology Behavioral Sciences Artificial Intelligence Computer Engineering Health Sciences Communication speech perception time-frequency masking noise reduction hearing impairment environmental sound identification environmental sound recognition masking speech recognition speech intelligibility speech in noise speech enhancement deep learning attention attentive recurrent network deep neural network divided attention acoustics

Page generated in 0.3156 seconds