Global ETD Search

11	MICROPHONE ARRAY SYSTEM FOR SPEECH ENHANCEMENT IN LAPTOPS THUPALLI, NAVEEN KUMAR January 2012 (has links) Recognition of speech at the receiver end generally gets degraded in distant talking atmospheres of laptops, teleconfereing, video conferences and in hands free telephony, where the quality of speech gets contaminated and severely disturbed because of the additive noises. To make useful and effective, the exact speech signals has to be extracted from the noise signals and the user has to be given the clean speech. In such conditions the convenience of microphone array has been preferred as a means of civilizing the quality of arrested signals. A consequential growth in laptop technology and microphone array processing have made possible to improve intelligibility of speech while communication. So this contention target on reducing the additive noises from the original speech, beside design and use of different algorithms. In this thesis a multi-channel microphone array with its speech enhancement of signals to Wiener Beamformar and Generalized side lobe canceller (GSC) are used for Laptops in a noisy environment. Systems prescribed above were implemented, processed and evaluated on a computer using Mat lab considering SNR, SNRI as the main objective of quality measures. Systems were tested with two speech signals, among which one is Main speech signal and other is considered as Noise along with another random noise, sampling them at 16 KHz .Three Different source originations were taken into consideration with different input SNR’s of 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. Simulation Results showed that Noise is been attenuated to a great extent. But Variations in SNR and SNRI has been observed, because of the different point origination of signals in the respective feilds.Variation in SNR and SNRI is been observed when the distance between the main speech originating point and microphone is too long compared to the noise signals. This states that origination of signals plays a huge role in maintaining the speech quality at the receiver end. / D.No 4-22, Gandla street, papanaidupeta-517526 chittoor district,Andhra pradesh India naveenkumarthupalli@gmail.com Wiener Beamformer Speech Enhancement Generalized side lobe canceller
12	Ενίσχυση ήχων κάτω από περιβάλλον θορύβου Αθανασίου, Γεωργία 08 July 2011 (has links) Στόχος της συγγραφής αυτής της διπλωματικής εργασίας είναι η μελέτη των διαδικασιών μέσω των οποίων πραγματοποιείται η ενίσχυση ήχου κάτω από περιβάλλον θορύβου. Γίνεται εισαγωγή στις βασικές έννοιες του speech enhancement και ανάλυση των τεχνικών αποθορυβοποίησης. Επίσης γίνεται μελέτη των αρχών και μεθόδων αξιολόγησης Αναβαθμισμένων Ενθόρυβων Σημάτων. Όλα αυτά ολοκληρώνονται από τον αντίστοιχο κώδικα στο Matlab. / The main goal of this project is the study of the premium techniques of speech enhancement and explore the way that they are evaluated. Ενίσχυση ήχου Θόρυβος 621.382 24 Speech enhancement Noise
13	Speech detection, enhancement and compression for voice communications Cho, Yong Duk January 2001 (has links) Speech signal processing for voice communications can be characterised in terms of silence compression, noise reduction, and speech compression. The limit in the channel bandwidth of voice communication systems requires efficient compression of speech and silence signals while retaining the voice quality. Silence compression by means of both voice activity detection (VAD) and comfort noise generation could present transparent speech-quality while substantially lowering the transmission bit-rate, since pause regions between talk spurts do not include any voice information. Thus, this thesis proposes smoothed likelihood ratio-based VAD, designed on the basis of a behavioural analysis and improvement of a statistical model-based voice activity detector. Input speech could exhibit noisy signals, which could make the voice communication fatiguing and less intelligible. This task can be alleviated by noise reduction as a preprocessor for speech coding. Noise characteristics in speech enhancement are adapted typically during the pause regions classified by a voice activity detector. However, VAD errors could lead to over- or under- estimation of the noise statistics. Thus, this thesis proposes mixed decision-based noise adaptation based on a integration of soft and hard decision-based methods, defined with the speech presence uncertainty and VAD result, respectively. At low bit-rate speech coding, the sinusoidal model has been widely applied because of its good nature exploiting the phase redundancy of speech signals. Its performance, however, can be severely smeared by mis-estimation of the pitch. Thus, this thesis proposes a robust pitch estimation technique based on the autocorrelation of spectral amplitudes. Another important parameter in sinusoidal speech coding is the spectral magnitude of the LP-residual signal. It is, however, not easy to directly quantise the magnitudes because the dimensions of the spectral vectors are variable from frame to frame depending on the pitch. To alleviate this problem, this thesis proposes mel-scale-based dimension conversion, which converts the spectral vectors to a fixed dimension based on mel-scale warping. A predictive coding scheme is also employed in order to exploit the inter-frame redundancy between the spectral vectors. Experimental results show that each proposed technique is suitable for enhancing speech quality for voice communications. Furthermore, an improved speech coder incorporating the proposed techniques is developed. The vocoder gives speech quality comparable to TIA/EIA IS-127 for noisy speech whilst operating at lower than half the bit-rate of the reference coder. Key words: voice activity detection, speech enhancement, pitch, spectral magnitude quantisation, low bit-rate coding. 621.3822
14	Neural Enhancement Strategies for Robust Speech Processing Nawar, Mohamed Nabih Ali Mohamed 10 March 2023 (has links) In real-world scenarios, speech signals are often contaminated with environmental noises, and reverberation, which degrades speech quality and intelligibility. Lately, the development of deep learning algorithms has marked milestones in speech- based research fields e.g. speech recognition, spoken language understanding, etc. As one of the crucial topics in the speech processing research area, speech enhancement aims to restore clean speech signals from noisy signals. In the last decades, many conventional speech enhancement statistical-based algorithms had been pro- posed. However, the performance of these approaches is limited in non-stationary noisy conditions. The raising of deep learning-based approaches for speech enhancement has led to revolutionary advances in their performance. In this context, speech enhancement is formulated as a supervised learning problem, which tackles the open challenges introduced by the speech enhancement conventional approaches. In general, deep learning speech enhancement approaches are categorized into frequency-domain and time-domain approaches. In particular, we experiment with the performance of the Wave-U-Net model, a solid and superior time-domain approach for speech enhancement. First, we attempt to improve the performance of back-end speech-based classification tasks in noisy conditions. In detail, we propose a pipeline that integrates the Wave-U-Net (later this model is modified to the Dilated Encoder Wave-U-Net) as a pre-processing stage for noise elimination with a temporal convolution network (TCN) for the intent classification task. Both models are trained independently from each other. Reported experimental results showed that the modified Wave-U-Net model not only improves the speech quality and intelligibility measured in terms of PESQ, and STOI metrics, but also improves the back-end classification accuracy. Later, it was observed that the dis-joint training approach often introduces signal distortion in the output of the speech enhancement module. Thus, it can deteriorate the back-end performance. Motivated by this, we introduce a set of fully time- domain joint training pipelines that combine the Wave-U-Net model with the TCN intent classifier. The difference between these architectures is the interconnections between the front-end and back-end. All architectures are trained with a loss function that combines the MSE loss as the front-end loss with the cross-entropy loss for the classification task. Based on our observations, we claim that the JT architecture with equally balancing both components’ contributions yields better classification accuracy. Lately, the release of large-scale pre-trained feature extraction models has considerably simplified the development of speech classification and recognition algorithms. However, environmental noise and reverberation still negatively affect performance, making robustness in noisy conditions mandatory in real-world applications. One way to mitigate the noise effect is to integrate a speech enhancement front-end that removes artifacts from the desired speech signals. Unlike the state-of-the-art enhancement approaches that operate either on speech spectrogram, or directly on time-domain signals, we study how enhancement can be applied directly on the speech embeddings, extracted using Wav2Vec, and WavLM models. We investigate a variety of training approaches, considering different flavors of joint and disjoint training of the speech enhancement front-end and of the classification/recognition back-end. We perform exhaustive experiments on the Fluent Speech Commands and Google Speech Commands datasets, contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, as well as on LibriSpeech, contaminated with noises from the MUSAN dataset, considering intent classification, keyword spotting, and speech recognition tasks respectively. Results show that enhancing the speech em-bedding is a viable and computationally effective approach, and provide insights about the most promising training approaches.
15	Methods for Objective and Subjective Video Quality Assessment and for Speech Enhancement Shahid, Muhammad January 2014 (has links) The overwhelming trend of the usage of multimedia services has raised the consumers' awareness about quality. Both service providers and consumers are interested in the delivered level of perceptual quality. The perceptual quality of an original video signal can get degraded due to compression and due to its transmission over a lossy network. Video quality assessment (VQA) has to be performed in order to gauge the level of video quality. Generally, it can be performed by following subjective methods, where a panel of humans judges the quality of video, or by using objective methods, where a computational model yields an estimate of the quality. Objective methods and specifically No-Reference (NR) or Reduced-Reference (RR) methods are preferable because they are practical for implementation in real-time scenarios. This doctoral thesis begins with a review of existing approaches proposed in the area of NR image and video quality assessment. In the review, recently proposed methods of visual quality assessment are classified into three categories. This is followed by the chapters related to the description of studies on the development of NR and RR methods as well as on conducting subjective experiments of VQA. In the case of NR methods, the required features are extracted from the coded bitstream of a video, and in the case of RR methods additional pixel-based information is used. Specifically, NR methods are developed with the help of suitable techniques of regression using artificial neural networks and least-squares support vector machines. Subsequently, in a later study, linear regression techniques are used to elaborate the interpretability of NR and RR models with respect to the selection of perceptually significant features. The presented studies on subjective experiments are performed using laboratory based and crowdsourcing platforms. In the laboratory based experiments, the focus has been on using standardized methods in order to generate datasets that can be used to validate objective methods of VQA. The subjective experiments performed through crowdsourcing relate to the investigation of non-standard methods in order to determine perceptual preference of various adaptation scenarios in the context of adaptive streaming of high-definition videos. Lastly, the use of adaptive gain equalizer in the modulation frequency domain for speech enhancement has been examined. To this end, two methods of demodulating speech signals namely spectral center of gravity carrier estimation and convex optimization have been studied. Video Quality Assessment No-Reference Methods Reduced-Reference Methods Subjective Experiments Speech Enhancement Adaptive Gain Equalizer
16	Melhoria da qualidade de sinais de fala degradados por ruído através da utilização de sinais sintetizados. / Speech Enhancement using synthetized signals. Maciel, Rogério Carlos Vieira 14 July 2003 (has links) Este trabalho discute um novo método para a melhoria da qualidade de sinais de fala degradados por ruído aditivo branco de elevada intensidade (SNR segmentada variando de 10 a 3 dB). A técnica apresentada baseia-se na soma ponderada entre um sinal obtido por subtração espectral e um sinal sintetizado, produzido de acordo com o modelo digital de produção de fala (análise e síntese LPC). Para a estimação dos coeficientes LPC e período de pitch, foi implementado um pré-processador baseado na técnica de subtração espectral (modificada especialmente para este propósito), o que melhora significativamente a qualidade geral do sinal de fala sintetizado. A soma ponderada entre o sinal obtido por subtração espectral e o sinal sintetizado permite a reconstrução de regiões espectrais perdidas devido aos efeitos da aplicação da subtração espectral, bem como a eliminação do ruído musical. Os testes realizados com frases foneticamente balanceadas lidas por diversos locutores permitem observar que o método proposto oferece melhores resultados do que a subtração espectral. O sinal de fala obtido apresenta também maior clareza e naturalidade, sem o inconveniente do ruído musical. / The present work discusses a new method to enhance speech signals degraded by white additive noise in high-noise environments (segmented SNR varying from 10 to 3 dB). The approach presented here is based upon a weighted sum involving a speech signal obtained from spectral subtraction and a synthetized speech signal, which is produced according to the concepts of the digital model of speech production (LPC analysis and synthesis). A spectral subtraction-based pre-processor was specifically implemented for LP coeficients and pitch period estimation, which significantly improves the overall quality of the synthetized speech signal. The weighted combination of these two speech signals allows the reconstruction of spectral regions lost due to the effects of spectral subtraction, as well as the elimination of musical noise. Tests conducted with phonetic-ballanced sentences from several speakers show that the proposed method offers better results than spectral subtraction, producing a more natural and clear enhanced speech signal, without the common artifact of musical noise. digital signal processing processamento de fala processamento de voz processamento digital de sinais redução de ruído speech enhancement voice processing
17	Melhoria da qualidade de sinais de fala degradados por ruído através da utilização de sinais sintetizados. / Speech Enhancement using synthetized signals. Rogério Carlos Vieira Maciel 14 July 2003 (has links) Este trabalho discute um novo método para a melhoria da qualidade de sinais de fala degradados por ruído aditivo branco de elevada intensidade (SNR segmentada variando de 10 a 3 dB). A técnica apresentada baseia-se na soma ponderada entre um sinal obtido por subtração espectral e um sinal sintetizado, produzido de acordo com o modelo digital de produção de fala (análise e síntese LPC). Para a estimação dos coeficientes LPC e período de pitch, foi implementado um pré-processador baseado na técnica de subtração espectral (modificada especialmente para este propósito), o que melhora significativamente a qualidade geral do sinal de fala sintetizado. A soma ponderada entre o sinal obtido por subtração espectral e o sinal sintetizado permite a reconstrução de regiões espectrais perdidas devido aos efeitos da aplicação da subtração espectral, bem como a eliminação do ruído musical. Os testes realizados com frases foneticamente balanceadas lidas por diversos locutores permitem observar que o método proposto oferece melhores resultados do que a subtração espectral. O sinal de fala obtido apresenta também maior clareza e naturalidade, sem o inconveniente do ruído musical. / The present work discusses a new method to enhance speech signals degraded by white additive noise in high-noise environments (segmented SNR varying from 10 to 3 dB). The approach presented here is based upon a weighted sum involving a speech signal obtained from spectral subtraction and a synthetized speech signal, which is produced according to the concepts of the digital model of speech production (LPC analysis and synthesis). A spectral subtraction-based pre-processor was specifically implemented for LP coeficients and pitch period estimation, which significantly improves the overall quality of the synthetized speech signal. The weighted combination of these two speech signals allows the reconstruction of spectral regions lost due to the effects of spectral subtraction, as well as the elimination of musical noise. Tests conducted with phonetic-ballanced sentences from several speakers show that the proposed method offers better results than spectral subtraction, producing a more natural and clear enhanced speech signal, without the common artifact of musical noise. processamento de fala processamento de voz processamento digital de sinais redução de ruído digital signal processing speech enhancement voice processing
18	Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation ITAKURA, Fumitada, TAKEDA, Kazuya, HUY DAT, Tran 01 March 2008 (has links) No description available. moment matching generalized gamma distribution speech recognition multi-channel speech enhancement
19	DSP Techniques for Performance Enhancement of Digital Hearing Aid Udayashankara, V 12 1900 (has links) Hearing impairment is the number one chronic disability affecting people in the world. Many people have great difficulty in understanding speech with background noise. This is especially true for a large number of elderly people and the sensorineural impaired persons. Several investigations on speech intelligibility have demonstrated that subjects with sensorineural loss may need a 5-15 dB higher signal-to-noise ratio than the normal hearing subjects. While most defects in transmission chain up to cochlea can nowadays be successfully rehabilitated by means of surgery, the great majority of the remaining inoperable cases are sensorineural hearing impaired, Recent statistics of the hearing impaired patients applying for a hearing aid reveal that 20% of the cases are due to conductive losses, more than 50% are due to sensorineural losses, and the rest 30% of the cases are of mixed origin. Presenting speech to the hearing impaired in an intelligible form remains a major challenge in hearing-aid research today. Even-though various methods have been suggested in the literature for the minimization of noise from the contaminated speech signals, they fail to give good SNR improvement and intelligibility improvement for moderate to-severe sensorineural loss subjects. So far, the power and capability of Newton's method, Nonlinear adaptive filtering methods and the feedback type artificial neural networks have not been exploited for this purpose. Hence we resort to the application of all these methods for improving SNR and intelligibility for the sensorineural loss subjects. Digital hearing aids frequently employ the concept of filter banks. One of the major drawbacks of this techniques is the complexity of computation requiring more number of multiplications. This increases the power consumption. Therefore this Thesis presents the new approach to speech enhancement for the hearing impaired and also the construction of filter bank in Digital hearing aid with minimum number of multiplications. The following are covered in this thesis. One of the most important application of adaptive systems is in noise cancellation using adaptive filters. The ANC setup requires two input signals (viz., primary and reference). The primary input consists of the sum of the desired signal and noise which is uncorrelated. The reference input consists of mother noise which is correlated in Some unknown way with noise of primary input. The primary signal is obtained by placing the omnidirectional microphone just above one ear on the head of the KEMAR mannikan and the reference signal is obtained by placing the hypercardioid microphone at the center of the vertebral column on the back. Conventional speech enhancement techniques use linear schemes for enhancing speech signals. So far Nonlinear adaptive filtering techniques are not used in hearing aid applications. The motivation behind the use of nonlinear model is that it gives better noise suppression as compared to linear model. This is because the medium through which signals reach the microphone may be highly nonlinear. Hence the use of linear schemes, though motivated by computational simplicity and mathematical tractability, may be suboptimal. Hence, we propose the use of nonlinear models to enhance the speech signals for the hearing impaired: We propose both Linear LMS and Nonlinear second order Volterra LMS schemes to enhance speech signals. Studies conducted for different environmental noise including babble, cafeteria and low frequency noise show that the second-order Volterra LMS performs better compared to linear LMS algorithm. We use measures such as signal-to-noise ratio (SNR), time plots, and intelligibility tests for performance comparison. We also propose an ANC scheme which uses Newton's method to enhance speech signals. The main problem associated with LMS based ANC is that their convergence is slow and hence their performance becomes poor for hearing aid applications. The reason for choosing Newton's method is that they have high performance adaptive-filtering methods that often converge and track faster than LMS method. We propose two models to enhance speech signals: one is conventional linear model and the other is a nonlinear model using a second order Volterra function. Development of Newton's type algorithm for linear mdel results in familiar Recursive least square (RLS) algorithm. The performance of both linear and non-linear Newton's algorithm is evaluated for babble, cafeteria and frequency noise. SNR, timeplots and intelligibility tests are used for performance comparison. The results show that Newton's method using Volterra nonlinearity performs better than RLS method. ln addition to the ANC based schemes, we also develop speech enhancement for the hearing impaired by using the feedback type neural network (FBNN). The main reason is that here we have parallel algorithm which can be implemented directly in hardware. We translate the speech enhancement problem into a neural network (NN) framework by forming an appropriate energy function. We propose both linear and nonlinear FBNN for enhancing the speech signals. Simulated studies on different environmental noise reveal that the FBNN using the Volterra nonlinearity is superior to linear FBNN in enhancing speech signals. We use SNR, time plots, and intelligibility tests for performance comparison. The design of an effective hearing aid is a challenging problem for sensorineural hearing impaired people. For persons with sensorineural losses it is necessary that the frequency response should be optimally fitted into their residual auditory area. Digital filter enhances the performance of the hearing aids which are either difficult or impossible to realize using analog techniques. The major problem in digital hearing aid is that of reducing power consumption. Multiplication is one of the most power consuming operation in digital filtering. Hence a serious effort has been made to design filter bank with minimum number of multiplications, there by minimizing the power consumption. It is achieved by using Interpolated and complementary FIR filters. This method gives significant savings in the number of arithmetic operations. The Thesis is concluded by summarizing the results of analysis, and suggesting scope for further investigation Electrical Communication Speech Signal Processing Digital Hearing Aid Speech Enhancement Digital Filter Bank Linear Model
20	Υλοποίηση αλγορίθμων ακουστικής επεξεργασίας σημάτων σε επεξεργαστή ειδικού σκοπού Κωστάκης, Βάιος 09 October 2014 (has links) Στην παρούσα διπλωματική αναπτύχθηκε μια μέθοδος ψηφιακής επεξεργασίας σημάτων για ακουστικά σήματα συμβατή με πραγματικού χρόνου επεξεργασία. Αρχικά έγινε περίληψη των λειτουργιών των επεξεργαστών ειδικού σκοπου. Έγινε μελέτη της ανάλυσης στο πεδίο της συχνότητας καθώς και της συνάρτησης συνεκτικότητας. Για τους σκοπούς της διπλωματικής υλοποιήθηκε αλγόριθμος αφαίρεσης θορύβου από σήματα ομιλίας που αξιοποιεί την συνάρτηση συνεκτικότητας και χρησιμοποιεί είσοδο από δύο μικρόφωνα. Ο αλγόριθμος αυτός υλοποιήθηκε και δοκιμάστικε σε μη-πραγματικό χρόνο σε μαθηματικό λογισμικό , καθώς και σε πραγματικό χρόνο σε επεξεργαστή ειδικού σκοπού. / In this thesis, a method of digital signal processing for acoustic signals was developed, compatible with real-time processing. At first, a review of the operations that special purpose digital signal processors feature. We also studied the frequency domain analysis and the coherence function in depth. For the purposes of this thesis an algorithm of noise reduction from speech signals was implemented, that exploits the coherence function and takes two microphone signals as inputs. The algorithm was implemented offline in a mathematical software, as well as real time in a special purpose digital signal processor. Αφαίρεση θορύβου 621.382 2 Digital signal processing Noise suppression Speech enhancement Coherence

Search results