• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis

January 2018 (has links)
abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2018
2

Modeling Speech Sound Radiation With Different Degrees of Realism for Articulatory Synthesis

Birkholz, Peter, Ossmann, Steffen, Blandin, Rémi, Wilbrandt, Alexander, Krug, Paul Konstantin, Fleischer, Mario 11 June 2024 (has links)
Articulatory synthesis is based on modeling various physical phenomena of speech production, including sound radiation from the mouth. With regard to sound radiation, the most common approach is to approximate it in terms of a simple spherical source of strength equal to the mouth volume velocity. However, because this approximation is only valid at very low frequencies and does not account for the diffraction by the head and the torso, we simulated two alternative radiation characteristics that are potentially more realistic: the radiation from a vibrating piston in a spherical baffle, and the radiation from the mouth of a detailed model of the human head and torso. Using the articulatory speech synthesizer VocalTractLab, a corpus of 10 sentences was synthesized with the different radiation characteristics combined with three different phonation types. The synthesized sentences were acoustically compared with natural recordings of the same sentences in terms of their long-term average spectra (LTAS), and evaluated in terms of their naturalness and intelligibility. The intelligibility was not affected by the type of radiation characteristic. However, it was found that the more similar their LTAS was to real speech, the more natural the synthetic sentences were perceived to be. Hence, the naturalness was not directly determined by the realism of the radiation characteristic, but by the combined spectral effect of the radiation characteristic and the voice source. While the more realistic radiation models do not per se improve synthesis quality, they provide new insights in the study of speech production and articulatory synthesis.

Page generated in 0.0603 seconds