Spelling suggestions: "subject:"articulated synthesis"" "subject:"articulate synthesis""
1 |
Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech SynthesisJanuary 2018 (has links)
abstract: Speech is generated by articulators acting on
a phonatory source. Identification of this
phonatory source and articulatory geometry are
individually challenging and ill-posed
problems, called speech separation and
articulatory inversion, respectively.
There exists a trade-off
between decomposition and recovered
articulatory geometry due to multiple
possible mappings between an
articulatory configuration
and the speech produced. However, if measurements
are obtained only from a microphone sensor,
they lack any invasive insight and add
additional challenge to an already difficult
problem.
A joint non-invasive estimation
strategy that couples articulatory and
phonatory knowledge would lead to better
articulatory speech synthesis. In this thesis,
a joint estimation strategy for speech
separation and articulatory geometry recovery
is studied. Unlike previous
periodic/aperiodic decomposition methods that
use stationary speech models within a
frame, the proposed model presents a
non-stationary speech decomposition method.
A parametric glottal source model and an
articulatory vocal tract response are
represented in a dynamic state space formulation.
The unknown parameters of the
speech generation components are estimated
using sequential Monte Carlo methods
under some specific assumptions.
The proposed approach is compared with other
glottal inverse filtering methods,
including iterative adaptive inverse filtering,
state-space inverse filtering, and
the quasi-closed phase method. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2018
|
2 |
Modeling Speech Sound Radiation With Different Degrees of Realism for Articulatory SynthesisBirkholz, Peter, Ossmann, Steffen, Blandin, Rémi, Wilbrandt, Alexander, Krug, Paul Konstantin, Fleischer, Mario 11 June 2024 (has links)
Articulatory synthesis is based on modeling various physical phenomena of speech production, including sound radiation from the mouth. With regard to sound radiation, the most common approach is to approximate it in terms of a simple spherical source of strength equal to the mouth volume velocity. However, because this approximation is only valid at very low frequencies and does not account for the diffraction by the head and the torso, we simulated two alternative radiation characteristics that are potentially more
realistic: the radiation from a vibrating piston in a spherical baffle, and the radiation from the mouth of a detailed model of the human head and torso. Using the articulatory speech synthesizer VocalTractLab, a corpus of 10 sentences was synthesized with the different radiation characteristics combined with three different phonation types. The synthesized sentences were acoustically compared with natural recordings of the same sentences in terms of their long-term average spectra (LTAS), and evaluated in terms of their naturalness and intelligibility. The intelligibility was not affected by the type of radiation characteristic. However, it was found that the more similar their LTAS was to real speech, the more natural the synthetic sentences were perceived to be. Hence, the naturalness was not directly determined by the realism of the radiation characteristic, but by the combined spectral effect of the radiation characteristic and the voice source. While the more realistic radiation models do not per se improve synthesis quality, they provide new insights in the study of speech production and articulatory synthesis.
|
Page generated in 0.0603 seconds