In this thesis, we describe two approaches to automatically synthesize the emotional speech of a target speaker based on the hidden Markov model for his/her neutral speech.
In the interpolation based method, the basic idea is the model interpolation between the neutral model of the target speaker and an emotional model selected from a candidate pool. Both the interpolation model selection and the interpolation weight computation are determined based on a model-distance measure. We propose a monophone-based Mahalanobis
distance (MBMD).
In the parallel model combination (PMC) based method, our basic idea is to model the mismatch between neutral model and emotional model. We train linear regression model to describe this mismatch. And then we combine the target speaker neutral model with the linear regression model.
We evaluate our approach on the synthesized emotional speech of angriness, happiness, and sadness with several subjective tests. Experimental results show that the implemented system is able to synthesize speech with emotional expressiveness of the target speaker.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0830110-111455 |
Date | 30 August 2010 |
Creators | Yang, Chih-Yung |
Contributors | Chia-Ping Chen, Hsin-Min Wang, Jui-Feng Yeh, Chung-Hsien Wu |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0830110-111455 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0023 seconds