Global ETD Search

1	Spectro-Temporal Features For Robust Automatic Speech Recognition Suryanarayana, Venkata K 01 1900 (has links) The speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, co-articulation and successive articulation of different phonetic sounds. In this thesis we are looking for improving the speech recognition performance through better feature parameters using a non-stationary model of speech. One effective means of modeling a general non-stationary signal is using the AM-FM model. AM-FM model can be extended to speech through a sub-band analysis, which can be mimic the auditory analysis. In this thesis, we explore new methods for estimating AM and FM parameters based on the non-uniform samples of the signal. The non-uniform sample approach along with adaptive window estimation provides for important advantage because of multi-resolution analysis. We develop several new methods based on ZC intervals, local extrema intervals and signal derivative at ZC’s as different sample measures of the signal and explore their effectiveness for instantaneous frequency (IF) and instantaneous envelope (IE) estimation. To deal with speech signal for automatic speech recognition, we explore the use of auditory motivated spectro temporal information through the use of an auditory filter bank and signal parameters (or features) are derived from the instantaneous energy in each band using the non-linear energy operator over a larger window length. The temporal correlation present in the signal is exploited by using DCT and keeping the lower few coefficients of DCT to keep the trend in the energy in each band. The DCT coefficients from different frequency bands are concatenated together, and a further spectral decorrelation is achieved through KLT (Karhunen-Loeve Transform) of the concatenated feature vector. The changes in the vocaltract are well captured by the change in the formant structure and to emphasize these details for ASR we have defined a temporal formant by using the AM-FM decomposition of sub-band speech. A uniform wideband non-overlaping filters are used for sub-band decomposition. The temporal formant is defined using the AM-FM parameters of each subband signal. The temporal evolution of a formant is represented by the lower order DCT coefficients of the temporal formant in each band and its use for ASR is explored. To address the robustness of ASR performance to environmental noisy conditions, we have used a hybrid approach of enhancing the speech signal using statistical models of the speech and noise. Use of GMM for statistical speech enhancement has been shown to be effective. It is found that the spectro-temporal features derived from enhanced speech provide further improvement to ASR performance. Speech Recognition Speech Signal Processing Automatic Speech Recognition (ASR) Robust Speech Recognition AM-FM Modeling Computer Science
2	Model-driven Time-varying Signal Analysis and its Application to Speech Processing January 2016 (has links) abstract: This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels. A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception. A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Applied mathematics Electrical engineering Speech therapy AM-FM Modeling Empirical Mode Decomposition Hilbert Spectral Analysis Instantaneous Frequency Latent Signal Analysis Time-Frequency Analysis

Search results

Spectro-Temporal Features For Robust Automatic Speech Recognition

Model-driven Time-varying Signal Analysis and its Application to Speech Processing