¡@¡@In this thesis, we use a hidden Markov model which can use a small amount of corpus to synthesize speech with certain quality to implement speech synthesis system for Chinese. More, the emotional speech are synthesized by the flexibility of the parametric speech in this model. We conduct model interpolation and model adaptation to synthesize speech from neutral to particular emotion without target speaker¡¦s emotional speech. In model adaptation, we use monophone-based Mahalanobis distance to select emotional models which are close to target speaker from pool of speakers, and estimate the interpolation weight to synthesize emotional
speech. In model adaptation, we collect abundant of data training average voice models for each individual emotion. These models are adapted to specific emotional models of target speaker by CMLLR method. In addition, we design the Latin-square evaluation to reduce the systematic offset in the subjective tests, making results more credible and fair. We synthesize emotional speech include happiness, anger, sadness, and use Latin square design to evaluate performance in three part similarity, naturalness, and emotional expression respectively. According to result, we make a comprehensive comparison and conclusions of two method in emotional speech synthesis.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0719112-143512 |
Date | 19 July 2012 |
Creators | Hsu, Chih-Yu |
Contributors | Hsin-Min Wang, Chia-Ping Chen, Chung-Hsien Wu |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0719112-143512 |
Rights | user_define, Copyright information available at source archive |
Page generated in 0.0018 seconds