Return to search

Nonlinear Interactive Source-filter Model For Voiced Speech

The linear source-filter model (LSFM) has been used as a primary model for speech processing
since 1960 when G. Fant presented acoustic speech production theory. It assumes
that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract.
However, acoustic simulations based on the physical speech production models show that,
especially when the fundamental frequency (F0) of source harmonics approaches to the first
formant frequency (F1) of vocal tract filter, the filter has significant effects on the source due
to the nonlinear coupling between them. In this thesis, as an alternative to linear source-filter
model, nonlinear interactive source-filter models are proposed for voiced speech.
This thesis has two parts, in the first part, a framework for the coupling of the source and the
filter is presented. Then, two interactive system models are proposed assuming that glottal
flow is a quasi-steady Bernoulli flow and acoustics in vocal tract is linear. In these models,
instead of glottal flow, glottal area is used as a source for voiced speech. In the proposed interactive
models, the relation between the glottal flow, glottal area and vocal tract is determined
by the quasi-steady Bernoulli flow equation. It is theoretically shown that linear source-filter
model is an approximation of the nonlinear models. Estimation of ISFM&rsquo / s parameters from only speech signal is a nonlinear blind deconvolution problem. The problem is solved by a
robust method developed based on the acoustical interpretation of the systems. Experimental
results show that ISFMs produce source-filter coupling effects seen in the physical simulations
and the parameter estimation method produce always stable and better performing
models than LSFM model. In addition, a framework for the incorporation of the source-filter
interaction into classical source-filter model is presented. The Rosenberg source model is extended
to an interactive source for voiced speech and its performance is evaluated on a large
speech database. The results of the experiments conducted on vowels in the database show
that the interactive Rosenberg model is always better than its noninteractive version.
In the second part of the thesis, LSFM and ISFMs are compared by using not only the speech
signal but also HSV (High Speed Endocopic Video) of vocal folds in a system identification
approach. In this case, HSV and speech are used as a reference input-output data for
the analysis and comparison of the models. First, a new robust HSV processing algorithm is
developed and applied on HSV images to extract the glottal area. Then, system parameters
are estimated by using a modified version of the method proposed in the first part. The experimental
results show that speech signal can contain some harmonics of the fundamental
frequency of the glottal area other than those contained in the glottal area signal. Proposed
nonlinear interactive source-filter models can generate harmonics components in speech and
produce more realistic speech sounds than LSFM.

Identiferoai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12615159/index.pdf
Date01 October 2012
CreatorsKoc, Turgay
ContributorsCiloglu, Tolga
PublisherMETU
Source SetsMiddle East Technical Univ.
LanguageEnglish
Detected LanguageEnglish
TypePh.D. Thesis
Formattext/pdf
RightsAccess forbidden for 1 year

Page generated in 0.0023 seconds