Global ETD Search

1	Voice inactivity ranking for enhancement of speech on microphone arrays Sharma, Siddhant 26 January 2022 (has links) Motivated by the problem of improving the performance of speech enhancement algorithms in non-stationary acoustic environments with low SNR, a framework is proposed for identifying signal frames of noisy speech that are unlikely to contain voice activity. Such voice-inactive frames can then be incorporated into an adaptation strategy to improve the performance of existing speech enhancement algorithms. This adaptive approach is applicable to single-channel as well as multi-channel algorithms for noisy speech. In both cases, the adaptive versions of the enhancement algorithms are observed to improve SNR levels by 20dB, as indicated by PESQ and WER criteria. In advanced speech enhancement algorithms, it is often of interest to identify some regions of the signal that have a high likelihood of being noise only i.e. no speech present. This is in contrast to advanced speech recognition, speaker recognition, and pitch tracking algorithms in which we are interested in identifying all regions that have a high likelihood of containing speech, as well as regions that have a high likelihood of not containing speech. In other terms, this would mean minimizing the false positive and false negative rates, respectively. In the context of speech enhancement, the identification of some speech-absent regions prompts the minimization of false positives while setting an acceptable tolerance on false negatives, as determined by the performance of the enhancement algorithm. Typically, Voice Activity Detectors (VADs) are used for identifying speech absent regions for the application of speech enhancement. In recent years a myriad of Deep Neural Network (DNN) based approaches have been proposed to improve the performance of VADs at low SNR levels by training on combinations of speech and noise. Training on such an exhaustive dataset is combinatorically explosive. For this dissertation, we propose a voice inactivity ranking framework, where the identification of voice-inactive frames is performed using a machine learning (ML) approach that only uses clean speech utterances for training and is robust to high levels of noise. In the proposed framework, input frames of noisy speech are ranked by ‘voice inactivity score’ to acquire definitely speech inactive (DSI) frame-sequences. These DSI regions serve as a noise estimate and are adaptively used by the underlying speech enhancement algorithm to enhance speech from a speech mixture. The proposed voice-inactivity ranking framework was used to perform speech enhancement in single-channel and multi-channel systems. In the context of microphone arrays, the proposed framework was used to determine parameters for spatial filtering using adaptive beamformers. We achieved an average Word Error Rate (WER) improvement of 50% at SNR levels below 0dB compared to the noisy signal, which is 7±2.5% more than the framework where state-of-the-art VAD decision was used for spatial filtering. For monaural signals, we propose a multi-frame multiband spectral-subtraction (MF-MBSS) speech enhancement system utilizing the voice inactivity framework to compute and update the noise statistics on overlapping frequency bands. The proposed MF-MBSS not only achieved an average PESQ improvement of 16% with a maximum improvement of 56% when compared to the state-of-the-art Spectral Subtraction but also a 5 ± 1.5% improvement in the Word Error Rate (WER) of the spatially filtered output signal, in non-stationary acoustic environments. Engineering Speech enhancement
2	Optimal and Adaptive Subband Beamforming / Optimal och Adaptiv Delbandsbeamforming Grbic, Nedelko January 2001 (has links) The increased use of personal communication devices, personal computers and wireless cellular telephones enables the development of new inter-personal communication systems. The merge between computers and telephony technologies brings up the demand for convenient hands-free communications. In such systems the users wish to lead a conversation in much the same way as in a normal person-to-person conversation. The advantages of hands-free telephones are safety, convenience and greater flexibility. In many countries and regions, hand held telephony in cars is prohibited by legislation. By placing the microphone far away from the user a number of disadvantages are introduced, which results in substantial speech distortion and poor sound quality. These disturbances are mainly caused by room reverberation and background noise. Furthermore, acoustic feedback generated at the near-end side is a problem for the far-end side talker, who will hear his/her own voice echoed with 100-200 ms delay, making speech conversation substantially more difficult. Digital filtering may be used to obtain a similar sound quality as for hand held telephony. Three major tasks must be addressed in order to improve the quality of hands-free communication systems; noise suppression, room reverberation suppression, and acoustic feedback cancellation of the hands-free loudspeaker. The filtering operation must perform the above mentioned tasks without causing severe near-end speech distortion. A properly designed broad-band microphone array is able to perform all the given tasks, i.e. speech enhancement, echo cancellation and reverberation suppression, in a concise and effective manner. This is due to the fact that the spatial domain may be utilized as well as the temporal domain. This thesis deals with the problem of specification and design of beamformers used to extract the source signal information. A new subband adaptive beamforming algorithm is proposed, where many of the drawbacks embedded in conventional adaptive beamforming are eliminated. Evaluation in a car hands-free situation show the benefits of the proposed method. Blind signal separation is discussed and a new structure based on frequency domain inverse channel identification and time domain separation, is proposed. Further, filter-bank properties and design are discussed together with performance limitations in subband beamforming structures. / Avhandlingen behandlar specifikation och konstruktion av mikrofon-arrayer för att extrahera talinformation. En ny adaptiv delbands beamforming-algoritm föreslås där många av nackdelarna hos konventionella adaptiva beamformers är eliminerade. En utvärdering i en bil med ett frihands-system bekräftar fördelarna med den föreslagna metoden. Blind signal-separation diskuteras och en ny struktur föreslås, baserad på en inverterande kanalidentifiering utförd i frekvensdomän med en kontinuerlig separation utförd i tidsdomän. Filterbanks-egenskaper och designmetoder diskuteras tillsammans med begränsningar som finns i beamforming-strukturer utförda i delband. Beamforming Optimal Adaptive Speech Enhancement
3	Signal enhancement based on multivariable adaptive noise cancellation Hung, Chih-Pin January 1995 (has links) No description available. 621.3994 Speech enhancement; Noise suppression
4	Direction of Arrival Estimation and Localization of Multiple Speech Sources in Enclosed Environments Swartling, Mikael January 2012 (has links) Speech communication is gaining in popularity in many different contexts as technology evolves. With the introduction of mobile electronic devices such as cell phones and laptops, and fixed electronic devices such as video and teleconferencing systems, more people are communicating which leads to an increasing demand for new services and better speech quality. Methods to enhance speech recorded by microphones often operate blindly without prior knowledge of the signals. With the addition of multiple microphones to allow for spatial filtering, many blind speech enhancement methods have to operate blindly also in the spatial domain. When attempting to improve the quality of spoken communication it is often necessary to be able to reliably determine the location of the speakers. A dedicated source localization method on top of the speech enhancement methods can assist the speech enhancement method by providing the spatial information about the sources. This thesis addresses the problem of speech-source localization, with a focus on the problem of localization in the presence of multiple concurrent speech sources. The primary work consists of methods to estimate the direction of arrival of multiple concurrent speech sources from an array of sensors and a method to correct the ambiguities when estimating the spatial locations of multiple speech sources from multiple arrays of sensors. The thesis also improves the well-known SRP-based methods with higher-order statistics, and presents an analysis of how the SRP-PHAT performs when the sensor array geometry is not fully calibrated. The thesis is concluded by two envelope-domain-based methods for tonal pattern detection and tonal disturbance detection and cancelation which can be useful to further increase the usability of the proposed localization methods. The main contribution of the thesis is a complete methodology to spatially locate multiple speech sources in enclosed environments. New methods and improvements to the combined solution are presented for the direction-of-arrival estimation, the location estimation and the location ambiguity correction, as well as a sensor array calibration sensitivity analysis. Beamforming Detection and classification Speech enhancement Source localization
5	Human perception in speech processing Grancharov, Volodya January 2006 (has links) The emergence of heterogeneous networks and the rapid increase of Voice over IP (VoIP) applications provide important opportunities for the telecommunications market. These opportunities come at the price of increased complexity in the monitoring of the quality of service (QoS) and the need for adaptation of transmission systems to the changing environmental conditions. This thesis contains three papers concerned with quality assessment and enhancement of speech communication systems in adverse environments. In paper A, we introduce a low-complexity, non-intrusive algorithm for monitoring speech quality over the network. In the proposed algorithm, speech quality is predicted from a set of features that capture important structural information from the speech signal. Papers B and C describe improvements in the conventional pre- and post-processing speech enhancement techniques. In paper B, we demonstrate that the causal Kalman filter implementation is in conflict with the key properties in human perception and propose solutions to the problem. In paper C, we propose adaptation of the conventional postfilter parameters to changes in the noisy conditions. A perceptually motivated distortion measure is used in the optimization of postfilter parameters. Significant improvement over nonadaptive system is obtained. / QC 20100824 quality assessment speech enhancement postfilter Telecommunication Telekommunikation
6	Speech segregation under reverberant conditions Shamsoddini, Ahmad January 1997 (has links) No description available. 610.28
7	Perceptual Binaural Speech Enhancement in Noisy Enviornments Dong, Rong 02 1900 (has links) Speech enhancement in multi-speaker babble remains an enormous challenge. In this study, we developed a binaural speech enhancement system to extract information pertaining to a target speech signal embedded in a noisy background for use in future hearing-aid systems. The principle underlying the proposed system is to simulate the perceptual auditory segregation process carried out in the normal human auditory system. Based on the spatial location, pitch and onset cues, the system can identify and enhance those time-frequency regions which constitute the target speech. The proposed system is capable of dealing with a wide variety of noise intrusions, including competing speech signals and multi-speaker babble. It also works under mild reverberation conditions. Systematic evaluation shows that the system achieves substantial improvement on the intelligibility of target signal, while it largely suppresses the unwanted background signal. / Thesis / Master of Applied Science (MASc) perceptual binaural speech speech enhancement noisy environments
8	Gaze Strategies and Audiovisual Speech Enhancement Yi, Astrid 31 December 2010 (has links) Quantitative relationships were established between speech intelligibility and gaze patterns when subjects listened to sentences spoken by a single talker at different auditory SNRs while viewing one or more talkers. When the auditory SNR was reduced and subjects moved their eyes freely, the main gaze strategy involved looking closer to the mouth. The natural tendency to move closer to the mouth was found to be consistent with a gaze strategy that helps subjects improve their speech intelligibility in environments that include multiple talkers. With a single talker and a fixed point of gaze, subjects' speech intelligibility was found to be optimal for fixations that were distributed within 10 degrees of the center of the mouth. Lower performance was observed at larger eccentricities, and this decrease in performance was investigated by mapping the reduced acuity in the peripheral region to various levels of spatial degradation. audiovisual speech enhancement gaze strategies speech intelligibility 0541
9	Gaze Strategies and Audiovisual Speech Enhancement Yi, Astrid 31 December 2010 (has links) Quantitative relationships were established between speech intelligibility and gaze patterns when subjects listened to sentences spoken by a single talker at different auditory SNRs while viewing one or more talkers. When the auditory SNR was reduced and subjects moved their eyes freely, the main gaze strategy involved looking closer to the mouth. The natural tendency to move closer to the mouth was found to be consistent with a gaze strategy that helps subjects improve their speech intelligibility in environments that include multiple talkers. With a single talker and a fixed point of gaze, subjects' speech intelligibility was found to be optimal for fixations that were distributed within 10 degrees of the center of the mouth. Lower performance was observed at larger eccentricities, and this decrease in performance was investigated by mapping the reduced acuity in the peripheral region to various levels of spatial degradation. audiovisual speech enhancement gaze strategies speech intelligibility 0541
10	Multi-Sensor Noise Suppression and Bandwidth Extension for Enhancement of Speech Hu, Rongqiang 17 January 2006 (has links) Speech enhancement has been an active research problem for decades and continues to be an important problem. This is made even more true by the proliferation of portable devices having audio input capabilities. In the presence of noise, both the quality and intelligibility of speech signals have been significantly deteriorated. The proposed research are the frameworks for improving the quality/intelligibility of the degraded speech: 1) a single-channel noise suppression system based on perceptual speech detection 2) multi-sensor noise suppression system for acoustic harsh environments based on non-air conductive sensors 3) a speech bandwidth extension system for telephone speech Significant improvement in both speech intelligibility and quality from the proposed frameworks are indicated from extensive experiments, inlcuding MOS, DRT, speech recognition task, and log spectral distortion. Speech processing Psychoacoutic model Speech enhancement Bandwidth extension Noise suppression

Search results