Spelling suggestions: "subject:"epeech modeling"" "subject:"cpeech modeling""
1 |
An acoustically-driven vocal tract model for stop consonant productionStory, Brad H., Bunton, Kate 03 1900 (has links)
The purpose of this study was to further develop a multi-tier model of the vocal tract area function in which the modulations of shape to produce speech are generated by the product of a vowel substrate and a consonant superposition function. The new approach consists of specifying input parameters for a target consonant as a set of directional changes in the resonance frequencies of the vowel substrate. Using calculations of acoustic sensitivity functions, these "resonance deflection patterns" are transformed into time-varying deformations of the vocal tract shape without any direct specification of location or extent of the consonant constriction along the vocal tract. The configuration of the constrictions and expansions that are generated by this process were shown to be physiologically-realistic and produce speech sounds that are easily identifiable as the target consonants. This model is a useful enhancement for area function-based synthesis and can serve as a tool for understanding how the vocal tract is shaped by a talker during speech production. (C) 2016 Elsevier B.V. All rights reserved.
|
2 |
Nonlinear Interactive Source-filter Model For Voiced SpeechKoc, Turgay 01 October 2012 (has links) (PDF)
The linear source-filter model (LSFM) has been used as a primary model for speech processing
since 1960 when G. Fant presented acoustic speech production theory. It assumes
that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract.
However, acoustic simulations based on the physical speech production models show that,
especially when the fundamental frequency (F0) of source harmonics approaches to the first
formant frequency (F1) of vocal tract filter, the filter has significant effects on the source due
to the nonlinear coupling between them. In this thesis, as an alternative to linear source-filter
model, nonlinear interactive source-filter models are proposed for voiced speech.
This thesis has two parts, in the first part, a framework for the coupling of the source and the
filter is presented. Then, two interactive system models are proposed assuming that glottal
flow is a quasi-steady Bernoulli flow and acoustics in vocal tract is linear. In these models,
instead of glottal flow, glottal area is used as a source for voiced speech. In the proposed interactive
models, the relation between the glottal flow, glottal area and vocal tract is determined
by the quasi-steady Bernoulli flow equation. It is theoretically shown that linear source-filter
model is an approximation of the nonlinear models. Estimation of ISFM&rsquo / s parameters from only speech signal is a nonlinear blind deconvolution problem. The problem is solved by a
robust method developed based on the acoustical interpretation of the systems. Experimental
results show that ISFMs produce source-filter coupling effects seen in the physical simulations
and the parameter estimation method produce always stable and better performing
models than LSFM model. In addition, a framework for the incorporation of the source-filter
interaction into classical source-filter model is presented. The Rosenberg source model is extended
to an interactive source for voiced speech and its performance is evaluated on a large
speech database. The results of the experiments conducted on vowels in the database show
that the interactive Rosenberg model is always better than its noninteractive version.
In the second part of the thesis, LSFM and ISFMs are compared by using not only the speech
signal but also HSV (High Speed Endocopic Video) of vocal folds in a system identification
approach. In this case, HSV and speech are used as a reference input-output data for
the analysis and comparison of the models. First, a new robust HSV processing algorithm is
developed and applied on HSV images to extract the glottal area. Then, system parameters
are estimated by using a modified version of the method proposed in the first part. The experimental
results show that speech signal can contain some harmonics of the fundamental
frequency of the glottal area other than those contained in the glottal area signal. Proposed
nonlinear interactive source-filter models can generate harmonics components in speech and
produce more realistic speech sounds than LSFM.
|
Page generated in 0.0477 seconds