• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 40
  • 7
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 66
  • 66
  • 21
  • 15
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • 10
  • 10
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Enhancement of Speech in Highly Nonstationary Noise Conditions using Harmonic Reconstruction

Liu, Xin 01 January 2009 (has links)
The quality and intelligibility of single channel speech degraded by additive noise remains a challenging problem when only the noisy speech is available. An accurate estimation of the noise spectrum is important for the effective performance of speech enhancement algorithms, especially in nonstationary noise environments. This thesis addresses both two issues. First, a speech enhancement algorithm using harmonic features is introduced. A spectral weighting function is derived by constrained optimization to suppress noise in the frequency domain. Two design parameters are included in the suppression gain, namely the frequency-dependent noise-flooring parameter (FDNFP) and the gain factor. The FDNFP controls the level of admissible residual noise in the enhanced speech, while further enhancement is achieved by adaptive comb filtering using the gain factor with a peak-picking algorithm. Second, a noise estimation algorithm is proposed for nonstationary noise conditions. The speech presence probability is updated by introducing a time-frequency dependent threshold. The frequency dependent smoothing factor for noise estimation is computed based on the estimated speech presence probability in each frequency bin. This algorithm adapts quickly to nonstationary noise environments and preserves more information on weak speech phoneme. The performance of the proposed speech enhancement algorithm is evaluated in terms of Perceptual Evaluation of Speech Quality (ITU-PESQ) scores and Modified Bark Spectral Distortion (MBSD) measures, composite objective measures and listening tests. Our listening tests indicate that 16 listeners on average preferred our harmonic enhanced speech over any of three other approaches about 73% of the time. The performance of the proposed noise estimation algorithm combined with the proposed speech enhancement method in nonstionary noise environments is also tested in terms of ITU-PESQ scores and MBSD measures. Experimental results indicate that the proposed noise estimation algorithm when integrated with the harmonic enhancement method outperforms spectral subtraction, signal subspace method, a perceptually-based enhancement method with a constant noise-flooring parameter, and our original harmonic speech enhancement method in highly nonstationary noise environments.
2

Voice inactivity ranking for enhancement of speech on microphone arrays

Sharma, Siddhant 26 January 2022 (has links)
Motivated by the problem of improving the performance of speech enhancement algorithms in non-stationary acoustic environments with low SNR, a framework is proposed for identifying signal frames of noisy speech that are unlikely to contain voice activity. Such voice-inactive frames can then be incorporated into an adaptation strategy to improve the performance of existing speech enhancement algorithms. This adaptive approach is applicable to single-channel as well as multi-channel algorithms for noisy speech. In both cases, the adaptive versions of the enhancement algorithms are observed to improve SNR levels by 20dB, as indicated by PESQ and WER criteria. In advanced speech enhancement algorithms, it is often of interest to identify some regions of the signal that have a high likelihood of being noise only i.e. no speech present. This is in contrast to advanced speech recognition, speaker recognition, and pitch tracking algorithms in which we are interested in identifying all regions that have a high likelihood of containing speech, as well as regions that have a high likelihood of not containing speech. In other terms, this would mean minimizing the false positive and false negative rates, respectively. In the context of speech enhancement, the identification of some speech-absent regions prompts the minimization of false positives while setting an acceptable tolerance on false negatives, as determined by the performance of the enhancement algorithm. Typically, Voice Activity Detectors (VADs) are used for identifying speech absent regions for the application of speech enhancement. In recent years a myriad of Deep Neural Network (DNN) based approaches have been proposed to improve the performance of VADs at low SNR levels by training on combinations of speech and noise. Training on such an exhaustive dataset is combinatorically explosive. For this dissertation, we propose a voice inactivity ranking framework, where the identification of voice-inactive frames is performed using a machine learning (ML) approach that only uses clean speech utterances for training and is robust to high levels of noise. In the proposed framework, input frames of noisy speech are ranked by ‘voice inactivity score’ to acquire definitely speech inactive (DSI) frame-sequences. These DSI regions serve as a noise estimate and are adaptively used by the underlying speech enhancement algorithm to enhance speech from a speech mixture. The proposed voice-inactivity ranking framework was used to perform speech enhancement in single-channel and multi-channel systems. In the context of microphone arrays, the proposed framework was used to determine parameters for spatial filtering using adaptive beamformers. We achieved an average Word Error Rate (WER) improvement of 50% at SNR levels below 0dB compared to the noisy signal, which is 7±2.5% more than the framework where state-of-the-art VAD decision was used for spatial filtering. For monaural signals, we propose a multi-frame multiband spectral-subtraction (MF-MBSS) speech enhancement system utilizing the voice inactivity framework to compute and update the noise statistics on overlapping frequency bands. The proposed MF-MBSS not only achieved an average PESQ improvement of 16% with a maximum improvement of 56% when compared to the state-of-the-art Spectral Subtraction but also a 5 ± 1.5% improvement in the Word Error Rate (WER) of the spatially filtered output signal, in non-stationary acoustic environments.
3

Optimal and Adaptive Subband Beamforming / Optimal och Adaptiv Delbandsbeamforming

Grbic, Nedelko January 2001 (has links)
The increased use of personal communication devices, personal computers and wireless cellular telephones enables the development of new inter-personal communication systems. The merge between computers and telephony technologies brings up the demand for convenient hands-free communications. In such systems the users wish to lead a conversation in much the same way as in a normal person-to-person conversation. The advantages of hands-free telephones are safety, convenience and greater flexibility. In many countries and regions, hand held telephony in cars is prohibited by legislation. By placing the microphone far away from the user a number of disadvantages are introduced, which results in substantial speech distortion and poor sound quality. These disturbances are mainly caused by room reverberation and background noise. Furthermore, acoustic feedback generated at the near-end side is a problem for the far-end side talker, who will hear his/her own voice echoed with 100-200 ms delay, making speech conversation substantially more difficult. Digital filtering may be used to obtain a similar sound quality as for hand held telephony. Three major tasks must be addressed in order to improve the quality of hands-free communication systems; noise suppression, room reverberation suppression, and acoustic feedback cancellation of the hands-free loudspeaker. The filtering operation must perform the above mentioned tasks without causing severe near-end speech distortion. A properly designed broad-band microphone array is able to perform all the given tasks, i.e. speech enhancement, echo cancellation and reverberation suppression, in a concise and effective manner. This is due to the fact that the spatial domain may be utilized as well as the temporal domain. This thesis deals with the problem of specification and design of beamformers used to extract the source signal information. A new subband adaptive beamforming algorithm is proposed, where many of the drawbacks embedded in conventional adaptive beamforming are eliminated. Evaluation in a car hands-free situation show the benefits of the proposed method. Blind signal separation is discussed and a new structure based on frequency domain inverse channel identification and time domain separation, is proposed. Further, filter-bank properties and design are discussed together with performance limitations in subband beamforming structures. / Avhandlingen behandlar specifikation och konstruktion av mikrofon-arrayer för att extrahera talinformation. En ny adaptiv delbands beamforming-algoritm föreslås där många av nackdelarna hos konventionella adaptiva beamformers är eliminerade. En utvärdering i en bil med ett frihands-system bekräftar fördelarna med den föreslagna metoden. Blind signal-separation diskuteras och en ny struktur föreslås, baserad på en inverterande kanalidentifiering utförd i frekvensdomän med en kontinuerlig separation utförd i tidsdomän. Filterbanks-egenskaper och designmetoder diskuteras tillsammans med begränsningar som finns i beamforming-strukturer utförda i delband.
4

Signal enhancement based on multivariable adaptive noise cancellation

Hung, Chih-Pin January 1995 (has links)
No description available.
5

Direction of Arrival Estimation and Localization of Multiple Speech Sources in Enclosed Environments

Swartling, Mikael January 2012 (has links)
Speech communication is gaining in popularity in many different contexts as technology evolves. With the introduction of mobile electronic devices such as cell phones and laptops, and fixed electronic devices such as video and teleconferencing systems, more people are communicating which leads to an increasing demand for new services and better speech quality. Methods to enhance speech recorded by microphones often operate blindly without prior knowledge of the signals. With the addition of multiple microphones to allow for spatial filtering, many blind speech enhancement methods have to operate blindly also in the spatial domain. When attempting to improve the quality of spoken communication it is often necessary to be able to reliably determine the location of the speakers. A dedicated source localization method on top of the speech enhancement methods can assist the speech enhancement method by providing the spatial information about the sources. This thesis addresses the problem of speech-source localization, with a focus on the problem of localization in the presence of multiple concurrent speech sources. The primary work consists of methods to estimate the direction of arrival of multiple concurrent speech sources from an array of sensors and a method to correct the ambiguities when estimating the spatial locations of multiple speech sources from multiple arrays of sensors. The thesis also improves the well-known SRP-based methods with higher-order statistics, and presents an analysis of how the SRP-PHAT performs when the sensor array geometry is not fully calibrated. The thesis is concluded by two envelope-domain-based methods for tonal pattern detection and tonal disturbance detection and cancelation which can be useful to further increase the usability of the proposed localization methods. The main contribution of the thesis is a complete methodology to spatially locate multiple speech sources in enclosed environments. New methods and improvements to the combined solution are presented for the direction-of-arrival estimation, the location estimation and the location ambiguity correction, as well as a sensor array calibration sensitivity analysis.
6

Human perception in speech processing

Grancharov, Volodya January 2006 (has links)
The emergence of heterogeneous networks and the rapid increase of Voice over IP (VoIP) applications provide important opportunities for the telecommunications market. These opportunities come at the price of increased complexity in the monitoring of the quality of service (QoS) and the need for adaptation of transmission systems to the changing environmental conditions. This thesis contains three papers concerned with quality assessment and enhancement of speech communication systems in adverse environments. In paper A, we introduce a low-complexity, non-intrusive algorithm for monitoring speech quality over the network. In the proposed algorithm, speech quality is predicted from a set of features that capture important structural information from the speech signal. Papers B and C describe improvements in the conventional pre- and post-processing speech enhancement techniques. In paper B, we demonstrate that the causal Kalman filter implementation is in conflict with the key properties in human perception and propose solutions to the problem. In paper C, we propose adaptation of the conventional postfilter parameters to changes in the noisy conditions. A perceptually motivated distortion measure is used in the optimization of postfilter parameters. Significant improvement over nonadaptive system is obtained. / QC 20100824
7

A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge

Sudirga, RENITA 25 August 2009 (has links)
Noise reduction aims to improve the quality of noisy speech by suppressing the background noise in the signal. However, there is always a tradeoff between noise reduction and signal distortion--more noise reduction is always accompanied by more signal distortion. An evaluation of the intelligibility of speech processed by several noise reduction algorithms in [23] showed that most noise reduction algorithms were not successful in improving the intelligibility of noisy speech. In this thesis, we aim to utilize acoustic-phonetic knowledge to enhance the intelligibility of noise-reduced speech. Acoustic-phonetics studies the characteristics of speech and the acoustic cues that are important for speech intelligibility. We considered the following questions: what is the noise reduction algorithm that we should use, what are the acoustic cues that should be targeted, and how to incorporate this information into the design of the noise reduction system. A Bayesian noise reduction method similar to the one proposed by Ephraim and Malah in [16] is employed. We first evaluate the goodness-of-fit of several parametric PDF models to the empirical speech data. For classified speech, we find that the Rayleigh and Gamma. with a fixed shape parameter of 5, model the speech spectral amplitude equally well. The Gamma-MAP and Gamma-MMSE estimators are derived. The subjective and objective performances of these estimators are then compared. We also propose to apply a class-based cue-enhancement, similar to those performed in [21]. The processing directly manipulates the acoustic cues known to be important for speech intelligibility. We assume that the system has the sound class information of the input speech. The scheme aims to enhance the interclass and intraclass distinction of speech sounds. The intelligibility of speech processed by the proposed system is then compared to the intelligibility of speech processed by the Rayleigh-MMSE estimator [16] The intelligibility evaluation shows that the proposed scheme enhances the detection of plosive and fricative sounds. However, it does not help in the intraclass discrimination of plosive sounds, and more tests need to be done to evaluate whether intraclass discrimination of fricatives is improved. The proposed scheme deteriorates the detection of nasal and affricate sounds. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-08-24 21:32:48.966
8

Single-Microphone Speech Dereverberation: Modulation Domain Processing and Quality Assessment

ZHENG, CHENXI 25 July 2011 (has links)
In a reverberant enclosure, acoustic speech signals are degraded by reflections from walls, ceilings, and objects. Restoring speech quality and intelligibility from reverberated speech has received increasing interest over the past few years. Although multiple channel dereverberation methods provide some improvements in speech quality/ intelligibility, single-channel dereverberation remains an open challenge. Two types of advanced single-channel dereverberation methods, namely acoustic domain spectral subtraction and modulation domain filtering, provide small improvement in speech quality and intelligibility. In this thesis, we study single-channel dereverberation algorithms. Firstly, an upper bound of time-frequency masking (TFM) performance for dereverberation is obtained using ideal time-frequency masking (ITFM). ITFM has access to both the clean and reverberated speech signals in estimating the binary-mask matrix. ITFM implements binary masking in the short time Fourier transform (STFT) domain, preserving only those spectral components less corrupted by reverberation. The experiment results show that single-channel ITFM outperforms four existing multi-channel dereverberation methods and suggest that large potential improvements could be obtained using TFM for speech dereverberation. Secondly, a novel modulation domain spectral subtraction method is proposed for dereverberation. This method estimates modulation domain long reverberation spectral variance (LRSV) from time domain LRSV using a statistical room impulse response (RIR) model and implements spectral subtraction in the modulation domain. On one hand, different from acoustic domain spectral subtraction, our method implements spectral subtraction in the modulation domain, which has been shown to play an important role in speech perception. On the other hand, different from modulation domain filtering which uses a time-invariant filter, our method takes the changes of reverberated speech spectral variance along time into account and implements spectral subtraction adaptively. Objective and informal subjective tests show that our proposed method outperforms two existing state-of-the-art single-channel dereverberation algorithms. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2011-07-20 03:18:30.021
9

Speech segregation under reverberant conditions

Shamsoddini, Ahmad January 1997 (has links)
No description available.
10

Perceptual Binaural Speech Enhancement in Noisy Enviornments

Dong, Rong 02 1900 (has links)
Speech enhancement in multi-speaker babble remains an enormous challenge. In this study, we developed a binaural speech enhancement system to extract information pertaining to a target speech signal embedded in a noisy background for use in future hearing-aid systems. The principle underlying the proposed system is to simulate the perceptual auditory segregation process carried out in the normal human auditory system. Based on the spatial location, pitch and onset cues, the system can identify and enhance those time-frequency regions which constitute the target speech. The proposed system is capable of dealing with a wide variety of noise intrusions, including competing speech signals and multi-speaker babble. It also works under mild reverberation conditions. Systematic evaluation shows that the system achieves substantial improvement on the intelligibility of target signal, while it largely suppresses the unwanted background signal. / Thesis / Master of Applied Science (MASc)

Page generated in 0.083 seconds