The objective of the proposed research is to develop a probabilistic model of speech production that exploits the multiplicity of mapping between the vocal tract area functions (VTAF) and speech spectra. Two thrusts are developed. In the first, a latent variable model that captures uncertainty in estimating the VTAF from speech data is investigated. The latent variable model uses this uncertainty to generate many-to-one mapping between observations of the VTAF and speech spectra. The second uses the probabilistic model of speech production to improve the performance of traditional speech algorithms, such as enhancement, acoustic model adaptation, etc.
In this thesis, we propose to model the process of speech production with a probability map. This proposed model treats speech production as a probabilistic process with many-to-one mapping between VTAF and speech spectra. The thesis not only outlines a statistical framework to generate and train these probabilistic models from speech, but also demonstrates its power and flexibility with such applications as enhancing speech from both perceptual and recognition perspectives.
Identifer | oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/42739 |
Date | 22 August 2011 |
Creators | Kalgaonkar, Kaustubh |
Publisher | Georgia Institute of Technology |
Source Sets | Georgia Tech Electronic Thesis and Dissertation Archive |
Detected Language | English |
Type | Dissertation |
Page generated in 0.0019 seconds