Global ETD Search

21	Melhoria da qualidade de sinais de fala degradados por ruído através da utilização de sinais sintetizados. / Speech Enhancement using synthetized signals. Rogério Carlos Vieira Maciel 14 July 2003 (has links) Este trabalho discute um novo método para a melhoria da qualidade de sinais de fala degradados por ruído aditivo branco de elevada intensidade (SNR segmentada variando de 10 a 3 dB). A técnica apresentada baseia-se na soma ponderada entre um sinal obtido por subtração espectral e um sinal sintetizado, produzido de acordo com o modelo digital de produção de fala (análise e síntese LPC). Para a estimação dos coeficientes LPC e período de pitch, foi implementado um pré-processador baseado na técnica de subtração espectral (modificada especialmente para este propósito), o que melhora significativamente a qualidade geral do sinal de fala sintetizado. A soma ponderada entre o sinal obtido por subtração espectral e o sinal sintetizado permite a reconstrução de regiões espectrais perdidas devido aos efeitos da aplicação da subtração espectral, bem como a eliminação do ruído musical. Os testes realizados com frases foneticamente balanceadas lidas por diversos locutores permitem observar que o método proposto oferece melhores resultados do que a subtração espectral. O sinal de fala obtido apresenta também maior clareza e naturalidade, sem o inconveniente do ruído musical. / The present work discusses a new method to enhance speech signals degraded by white additive noise in high-noise environments (segmented SNR varying from 10 to 3 dB). The approach presented here is based upon a weighted sum involving a speech signal obtained from spectral subtraction and a synthetized speech signal, which is produced according to the concepts of the digital model of speech production (LPC analysis and synthesis). A spectral subtraction-based pre-processor was specifically implemented for LP coeficients and pitch period estimation, which significantly improves the overall quality of the synthetized speech signal. The weighted combination of these two speech signals allows the reconstruction of spectral regions lost due to the effects of spectral subtraction, as well as the elimination of musical noise. Tests conducted with phonetic-ballanced sentences from several speakers show that the proposed method offers better results than spectral subtraction, producing a more natural and clear enhanced speech signal, without the common artifact of musical noise. processamento de fala processamento de voz processamento digital de sinais redução de ruído digital signal processing speech enhancement voice processing
22	Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation ITAKURA, Fumitada, TAKEDA, Kazuya, HUY DAT, Tran 01 March 2008 (has links) No description available. moment matching generalized gamma distribution speech recognition multi-channel speech enhancement
23	DSP Techniques for Performance Enhancement of Digital Hearing Aid Udayashankara, V 12 1900 (has links) Hearing impairment is the number one chronic disability affecting people in the world. Many people have great difficulty in understanding speech with background noise. This is especially true for a large number of elderly people and the sensorineural impaired persons. Several investigations on speech intelligibility have demonstrated that subjects with sensorineural loss may need a 5-15 dB higher signal-to-noise ratio than the normal hearing subjects. While most defects in transmission chain up to cochlea can nowadays be successfully rehabilitated by means of surgery, the great majority of the remaining inoperable cases are sensorineural hearing impaired, Recent statistics of the hearing impaired patients applying for a hearing aid reveal that 20% of the cases are due to conductive losses, more than 50% are due to sensorineural losses, and the rest 30% of the cases are of mixed origin. Presenting speech to the hearing impaired in an intelligible form remains a major challenge in hearing-aid research today. Even-though various methods have been suggested in the literature for the minimization of noise from the contaminated speech signals, they fail to give good SNR improvement and intelligibility improvement for moderate to-severe sensorineural loss subjects. So far, the power and capability of Newton's method, Nonlinear adaptive filtering methods and the feedback type artificial neural networks have not been exploited for this purpose. Hence we resort to the application of all these methods for improving SNR and intelligibility for the sensorineural loss subjects. Digital hearing aids frequently employ the concept of filter banks. One of the major drawbacks of this techniques is the complexity of computation requiring more number of multiplications. This increases the power consumption. Therefore this Thesis presents the new approach to speech enhancement for the hearing impaired and also the construction of filter bank in Digital hearing aid with minimum number of multiplications. The following are covered in this thesis. One of the most important application of adaptive systems is in noise cancellation using adaptive filters. The ANC setup requires two input signals (viz., primary and reference). The primary input consists of the sum of the desired signal and noise which is uncorrelated. The reference input consists of mother noise which is correlated in Some unknown way with noise of primary input. The primary signal is obtained by placing the omnidirectional microphone just above one ear on the head of the KEMAR mannikan and the reference signal is obtained by placing the hypercardioid microphone at the center of the vertebral column on the back. Conventional speech enhancement techniques use linear schemes for enhancing speech signals. So far Nonlinear adaptive filtering techniques are not used in hearing aid applications. The motivation behind the use of nonlinear model is that it gives better noise suppression as compared to linear model. This is because the medium through which signals reach the microphone may be highly nonlinear. Hence the use of linear schemes, though motivated by computational simplicity and mathematical tractability, may be suboptimal. Hence, we propose the use of nonlinear models to enhance the speech signals for the hearing impaired: We propose both Linear LMS and Nonlinear second order Volterra LMS schemes to enhance speech signals. Studies conducted for different environmental noise including babble, cafeteria and low frequency noise show that the second-order Volterra LMS performs better compared to linear LMS algorithm. We use measures such as signal-to-noise ratio (SNR), time plots, and intelligibility tests for performance comparison. We also propose an ANC scheme which uses Newton's method to enhance speech signals. The main problem associated with LMS based ANC is that their convergence is slow and hence their performance becomes poor for hearing aid applications. The reason for choosing Newton's method is that they have high performance adaptive-filtering methods that often converge and track faster than LMS method. We propose two models to enhance speech signals: one is conventional linear model and the other is a nonlinear model using a second order Volterra function. Development of Newton's type algorithm for linear mdel results in familiar Recursive least square (RLS) algorithm. The performance of both linear and non-linear Newton's algorithm is evaluated for babble, cafeteria and frequency noise. SNR, timeplots and intelligibility tests are used for performance comparison. The results show that Newton's method using Volterra nonlinearity performs better than RLS method. ln addition to the ANC based schemes, we also develop speech enhancement for the hearing impaired by using the feedback type neural network (FBNN). The main reason is that here we have parallel algorithm which can be implemented directly in hardware. We translate the speech enhancement problem into a neural network (NN) framework by forming an appropriate energy function. We propose both linear and nonlinear FBNN for enhancing the speech signals. Simulated studies on different environmental noise reveal that the FBNN using the Volterra nonlinearity is superior to linear FBNN in enhancing speech signals. We use SNR, time plots, and intelligibility tests for performance comparison. The design of an effective hearing aid is a challenging problem for sensorineural hearing impaired people. For persons with sensorineural losses it is necessary that the frequency response should be optimally fitted into their residual auditory area. Digital filter enhances the performance of the hearing aids which are either difficult or impossible to realize using analog techniques. The major problem in digital hearing aid is that of reducing power consumption. Multiplication is one of the most power consuming operation in digital filtering. Hence a serious effort has been made to design filter bank with minimum number of multiplications, there by minimizing the power consumption. It is achieved by using Interpolated and complementary FIR filters. This method gives significant savings in the number of arithmetic operations. The Thesis is concluded by summarizing the results of analysis, and suggesting scope for further investigation Electrical Communication Speech Signal Processing Digital Hearing Aid Speech Enhancement Digital Filter Bank Linear Model
24	Υλοποίηση αλγορίθμων ακουστικής επεξεργασίας σημάτων σε επεξεργαστή ειδικού σκοπού Κωστάκης, Βάιος 09 October 2014 (has links) Στην παρούσα διπλωματική αναπτύχθηκε μια μέθοδος ψηφιακής επεξεργασίας σημάτων για ακουστικά σήματα συμβατή με πραγματικού χρόνου επεξεργασία. Αρχικά έγινε περίληψη των λειτουργιών των επεξεργαστών ειδικού σκοπου. Έγινε μελέτη της ανάλυσης στο πεδίο της συχνότητας καθώς και της συνάρτησης συνεκτικότητας. Για τους σκοπούς της διπλωματικής υλοποιήθηκε αλγόριθμος αφαίρεσης θορύβου από σήματα ομιλίας που αξιοποιεί την συνάρτηση συνεκτικότητας και χρησιμοποιεί είσοδο από δύο μικρόφωνα. Ο αλγόριθμος αυτός υλοποιήθηκε και δοκιμάστικε σε μη-πραγματικό χρόνο σε μαθηματικό λογισμικό , καθώς και σε πραγματικό χρόνο σε επεξεργαστή ειδικού σκοπού. / In this thesis, a method of digital signal processing for acoustic signals was developed, compatible with real-time processing. At first, a review of the operations that special purpose digital signal processors feature. We also studied the frequency domain analysis and the coherence function in depth. For the purposes of this thesis an algorithm of noise reduction from speech signals was implemented, that exploits the coherence function and takes two microphone signals as inputs. The algorithm was implemented offline in a mathematical software, as well as real time in a special purpose digital signal processor. Αφαίρεση θορύβου 621.382 2 Digital signal processing Noise suppression Speech enhancement Coherence
25	Auditory domain speech enhancement Yang, Xiaofeng 04 June 2008 (has links) Many speech enhancement algorithms suffer from musical noise - an estimation residue noise consisting of music-like varying tones. To reduce this annoying noise, some speech enhancement algorithms require post-processing. However, a lack of auditory perception theories about musical noise limits the effectiveness of musical noise reduction methods. Scientists now have some understanding of the human auditory system, thanks to the advances in hearing research across multiple disciplines - anatomy, physiology, psychology, and neurophysiology. Auditory models, such as the gammatone filter bank and the Meddis inner hair cell model, have been developed to simulate the acoustic to neuron transduction process. The auditory models generate the neuron firing signals called the cochleagram. Cochleagram analysis is a powerful tool to investigate musical noise. We use auditory perception theories in our musical noise investigations. Some auditory perception theories (e.g., volley theory and auditory scene analysis theories) suggest that speech perception is an auditory grouping process. Temporal properties of neuron firing signals, such as period and rhythm, play important roles in the grouping process. The grouping process generates a foreground speech stream, a background noise stream, and possibly additional streams. We assume that musical noise is the result of grouping to the background stream the neuron firing signals whose temporal properties are different from the ones grouped to the foreground stream. Based on this hypothesis, we believe that a musical noise reduction method should increase the probability of grouping the enhanced neuron firing signals to the foreground speech stream, or decrease the probability of grouping them into the background stream. We propose a post-processing musical noise reduction method for the auditory Wiener filter speech enhancement method, in which we employ a proposed complex gammatone filter bank for the cochlear decomposition. The results of a subjective listening test of our speech enhancement system show that the proposed musical noise reduction method is effective. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2008-05-28 16:11:28.374 Speech enhancement Musical noise Gammatone filter Meddis inner hair cell model Cochleagram Auditory grouping Perception
26	Short-time Multichannel Noise Power Spectral Density Estimators for Acoustic Signals Blanchette, Jonathan 30 April 2014 (has links) The estimation of power spectral densities is a critical step in many speech enhancement algorithms. The demand for multi-channel speech enhancement systems is high with applications in teleconferencing, cellular phones, and hearing aids. The first objective of the thesis is to develop a general multi-channel framework to solve for the diffuse noise power spectral densities whenever the spatial correlation or coherence matrix is pre-estimated and the number of speakers is less than the number of microphones. The second objective is to develop closed-form analytical solutions. The performance of the developed algorithms is evaluated with pre-existing algorithms using prescribed performance measures. Noise power spectra estimation diffuse noise field multichannel acoustic system speech enhancement subspace decomposition
27	Compensation for Nonlinear Distortion in Noise for Robust Speech Recognition Harvilla, Mark J. 01 October 2014 (has links) The performance, reliability, and ubiquity of automatic speech recognition systems has flourished in recent years due to steadily increasing computational power and technological innovations such as hidden Markov models, weighted finite-state transducers, and deep learning methods. One problem which plagues speech recognition systems, especially those that operate offline and have been trained on specific in-domain data, is the deleterious effect of noise on the accuracy of speech recognition. Historically, robust speech recognition research has focused on traditional noise types such as additive noise, linear filtering, and reverberation. This thesis describes the effects of nonlinear dynamic range compression on automatic speech recognition and develops a number of novel techniques for characterizing and counteracting it. Dynamic range compression is any function which reduces the dynamic range of an input signal. Dynamic range compression is a widely-used tool in audio engineering and is almost always a component of a practical telecommunications system. Despite its ubiquity, this thesis is the first work to comprehensively study and address the effect of dynamic range compression on speech recognition. More specifically, this thesis treats the problem of dynamic range compression in three ways: (1) blind amplitude normalization methods, which counteract dynamic range compression when its parameter values allow the function to be mathematically inverted, (2) blind amplitude reconstruction techniques, i.e., declipping, which attempt to reconstruct clipped segments of the speech signal that are lost through non-invertible dynamic range compression, and (3) matched-training techniques, which attempt to select the pre-trained acoustic model with the closest set of compression parameters. All three of these methods rely on robust estimation of the dynamic range compression distortion parameters. Novel algorithms for the blind prediction of these parameters are also introduced. The algorithms' quality is evaluated in terms of the degree to which they decrease speech recognition word error rate, as well as in terms of the degree to which they increase a given speech signal's signal-to-noise ratio. In all evaluations, the possibility of independent additive noise following the application of dynamic range compression is assumed. signal processing speech recognition speech enhancement audio declipping noise reduction dynamic range compression
28	Incorporating Auditory Models in Speech/Audio Applications January 2011 (has links) abstract: Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task. / Dissertation/Thesis / Ph.D. Electrical Engineering 2011 Electrical Engineering Acoustics Engineering audio coding auditory models loudness estimation psychoacoustics speech enhancement
29	Short-time Multichannel Noise Power Spectral Density Estimators for Acoustic Signals Blanchette, Jonathan January 2014 (has links) The estimation of power spectral densities is a critical step in many speech enhancement algorithms. The demand for multi-channel speech enhancement systems is high with applications in teleconferencing, cellular phones, and hearing aids. The first objective of the thesis is to develop a general multi-channel framework to solve for the diffuse noise power spectral densities whenever the spatial correlation or coherence matrix is pre-estimated and the number of speakers is less than the number of microphones. The second objective is to develop closed-form analytical solutions. The performance of the developed algorithms is evaluated with pre-existing algorithms using prescribed performance measures. Noise power spectra estimation diffuse noise field multichannel acoustic system speech enhancement subspace decomposition
30	Time-domain Deep Neural Networks for Speech Separation Sun, Tao 24 May 2022 (has links) No description available. Computer Science Speech Separation Deep Neural Networks Self-supervised Learning Speech Enhancement Speaker Separation

Search results