In recent years, linear prediction voice encoders have become very efficient in terms of computing execution time and channel bandwidth usage while providing, in the absence of im- pulsive noise, natural sounding synthetic speech signals. This good performance has been achieved via the use of a maximum likelihood parameter estimation of an auto-regressive model of order ten that best fits the speech signal under the assumption that the signal and the noise are Gaussian stochastic processes. However, this method breaks down in the presence of impulse noise, which is common in practice, resulting in harsh or non-intelligible audio signals. In this paper, we propose a robust estimator of correlation, the Phase-Phase correlator that is able to cope with impulsive noise. Utilizing this correlator, we develop a Robust Mixed Excitation Linear Prediction encoder that provides improved audio quality for voiced, unvoiced, and transition speech segments. This is achieved by applying a statistical test to robust Mahalanobis distances for identifying the outliers in the corrupted speech signal, which are then replaced with filtered signals. Simulation results reveal that the proposed method outperforms in variance, bias, and breakdown point three other robust approaches based on the arcsin law, the polarity coincidence correlator, and the median- of-ratio estimator without sacrificing the encoder bandwidth efficiency and the compression gain while remaining compatible with real-time applications. Furthermore, in the presence of impulsive noise, the proposed speech encoder speech perceptual quality also outperforms the state of the art in terms of mean opinion score. / Doctor of Philosophy / Impulsive noise is a natural phenomenon in everyday experience. Impulsive noise can be analogous to discontinuities or a drastic change in natural progressions of events. Specifically in this research the disrupting events can occur in signals such as speech, power transmission, stock market, communication systems, etc. Sudden power outage due to lighting, maintenance or other catastrophic events are some of the reasons why we may experience performance degradation in our electronic devices. Another example of impulsive noise is when we play an old damaged vinyl records, which results in annoying clicking sounds. At the time instance of each click, the true music or speech or simply the audible waveform is completely destroyed. Other examples of impulse noise is a sudden crash in the stock market; a sudden dive in the market can destroy the regression and future predictions. Unfortunately, in the presence of impulsive noise, classical methods methods are unable to filter out the impulse corruptions.
The intended filtering objective of this dissertation is specific, but not limited, to speech signal processing. Specifically, research different filter model to determine the optimum method of eliminating impulsive noise in speech. Note, that the optimal filter model is different for time series signal model such as speech, stock market, power systems, etc. In our studies we have shown that our speech filter method outperforms the state of the art algorithms. Another major contribution of our research is in speech compression algorithm that is robust to impulse noise in speech. In digital signal processing, a compression method entails in representing the same signal with less data and yet convey the the same same message as the original signal. For example, human auditory system can produce sounds in the range of approximately 60 Hz and 3500 Hz, another word speech can occupy approximately 4000 Hz in frequency space. So the challenge is, can we compress speech in one of half of that space, or even less. This is a very attractive proposition because frequency space is limited but the wireless service providers desires to service as many users as possible without sacrificing quality and ultimately maximize the bottom line. Encoding impulse corrupted speech produces harsh quality of synthesized audio. We have shown if the encoding is done with the proposed method, synthesized audio quality is far superior to the sate of the art.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/97221 |
Date | 08 November 2019 |
Creators | Azad, Abul K. |
Contributors | Electrical Engineering, Mili, Lamine M., Clancy, Thomas Charles III, Zaghloul, Amir I., MacKenzie, Allen B., Ramakrishnan, Naren |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.003 seconds