• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 88
  • 16
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 181
  • 181
  • 61
  • 38
  • 38
  • 35
  • 33
  • 33
  • 20
  • 19
  • 18
  • 17
  • 14
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

一個廉價的漢字語音合成器. / Yi ge lian jia de Han zi yu yin he cheng qi.

January 1984 (has links)
胡承慈. / 大字複印本. / Thesis (M.A.)--香港中文大學研究院電子計算學部. / Da zi fu yin ben. / Includes bibliographical references (leaves 26-27 (2nd group)). / Hu Chengci. / Thesis (M.A.)--Xianggang Zhong wen da xue yan jiu yuan dian zi ji suan xue bu. / 嗚謝 --- p.I / ABSTRACT --- p.II / 提要 --- p.III / Chapter 第一章 --- 緒論 / Chapter 1.1 --- 用聲音作爲輸出媒介 --- p.1 / Chapter 1.2 --- 各種聲音輸出方法 --- p.1 / Chapter 1.3 --- 音素合成法及VOTRAX SC01語音合成片 --- p.3 / Chapter 1.4 --- 國語輸出 --- p.3 / Chapter 第二章 --- 中文分折 / Chapter 2.1 --- 國語分析 --- p.4 / Chapter 2.1.1 --- 國語音素 --- p.4 / Chapter 2.1.1.1 --- 單元音 --- p.4 / Chapter 2.1.1.2 --- 輔音 --- p.5 / Chapter 2.1.2 --- 國語音素的互相結合 / Chapter 2.1.2.1 --- 複元音 --- p.6 / Chapter 2.1.2.2 --- 鼻元音 --- p.6 / Chapter 2.1.3 --- 國語音節 --- p.7 / Chapter 2.2 --- 漢詞 --- p.9 / Chapter 2.3 --- 漢字 --- p.9 / Chapter 第三章 --- 硬件和軟件的設計 / Chapter 3.1 --- 硬件設計 --- p.10 / Chapter 3.2 --- 軟件設計 --- p.11 / Chapter 3.2.1 --- 漢字編碼和音素地址表 --- p.12 / Chapter 3.2.2 --- 音素串表 --- p.13 / Chapter 3.2.3 --- 語音合成器操作程序 --- p.14 / Chapter 3.2.4 --- 語音合成器管理程序和音素編輯程序 --- p.15 / Chapter 第四章 --- 建立和發現 / Chapter 4.1 --- 硬件的建立 --- p.15 / Chapter 4.2 --- 軟件的建立 --- p.18 / Chapter 4.2.1 --- 表的建立 --- p.18 / Chapter 4.2.2 --- 程序的建立 --- p.23 / Chapter 第五章 --- 結論 --- p.24 / Chapter 附錄A --- 參考資料 --- p.26 / Chapter 附錄B --- 漢語輔音音素表 --- p.28 / Chapter 附錄C --- 漢語元音音素表 --- p.29 / Chapter 附錄D --- SC01音素表 --- p.31 / Chapter 附錄E --- 軟件應用 / Chapter E.1 --- 語音合成器操作程序應用 --- p.34 / Chapter E.2 --- 語音合成器管理程序應用 --- p.41 / Chapter E.3 --- 音素編輯程序應用 --- p.45 / Chapter 附錄F --- 中英名詞對照表 / Chapter F.1 --- 聲音及發聲方法 --- p.53 / Chapter "F,2" --- 硬件 --- p.53 / Chapter F.3 --- 軟件 --- p.53 / Chapter 附錄G --- 語音合成器硬件電路圖 --- p.55
52

Cantonese text-to-speech synethesis using sub-syllable units. / 利用子音節的粤語文語轉換系統 / Cantonese text-to-speech synethesis using sub-syllable units. / Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong

January 2001 (has links)
Law Ka Man = 利用子音節的粤語文語轉換系統 / 羅家文. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Law Ka Man = Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong / Luo Jiawen. / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Text analysis --- p.2 / Chapter 1.2 --- Prosody prediction --- p.3 / Chapter 1.3 --- Speech generation --- p.3 / Chapter 1.4 --- The trend of TTS technology --- p.5 / Chapter 1.5 --- TTS systems for different languages --- p.6 / Chapter 1.6 --- Objectives of the thesis --- p.8 / Chapter 1.7 --- Thesis outline --- p.8 / References --- p.10 / Chapter 2. --- BACKGROUND --- p.11 / Chapter 2.1 --- Cantonese phonology --- p.11 / Chapter 2.2 --- Cantonese TTS - a baseline system --- p.16 / Chapter 2.3 --- Time-Domain Prrch-Synchronous-OverLap-Add --- p.17 / Chapter 2.3.1 --- "From, speech signal to short-time analysis signals" --- p.18 / Chapter 2.3.2 --- From short-time analysis signals to short-time synthesis signals --- p.19 / Chapter 2.3.3 --- From short-time synthesis signals to synthetic speech --- p.20 / Chapter 2.4 --- Time-scale and Pitch-scale modifications --- p.20 / Chapter 2.4.1 --- Voiced speech --- p.20 / Chapter 2.4.2 --- Unvoiced speech --- p.21 / Chapter 2.5 --- Summary --- p.22 / References --- p.23 / Chapter 3. --- SUB-SYLLABLE BASED TTS SYSTEM --- p.24 / Chapter 3.1 --- Motivations --- p.24 / Chapter 3.2 --- Choices of synthesis units --- p.27 / Chapter 3.2.1 --- Sub-syllable unit --- p.29 / Chapter 3.2.2 --- "Diphones, demi-syllables and sub-syllable units" --- p.31 / Chapter 3.3 --- Proposed TTS system --- p.32 / Chapter 3.3.1 --- Text analysis module --- p.33 / Chapter 3.3.2 --- Synthesis module --- p.36 / Chapter 3.3.3 --- Prosody module --- p.37 / Chapter 3.4 --- Summary --- p.38 / References --- p.39 / Chapter 4. --- ACOUSTIC INVENTORY --- p.40 / Chapter 4.1 --- The full set of Cantonese sub-syllable units --- p.40 / Chapter 4.2 --- A reduced set of sub-syllable units --- p.42 / Chapter 4.3 --- Corpus design --- p.44 / Chapter 4.4 --- Recording --- p.46 / Chapter 4.5 --- Post-processing of speech data --- p.47 / Chapter 4.6 --- Summary --- p.51 / References --- p.51 / Chapter 5. --- CONCATENATION TECHNIQUES --- p.52 / Chapter 5.1 --- Concatenation of sub-syllable units --- p.52 / Chapter 5.1.1 --- Concatenation of plosives and affricates --- p.54 / Chapter 5.1.2 --- Concatenation of fricatives --- p.55 / Chapter 5.1.3 --- "Concatenation of vowels, semi-vowels and nasals" --- p.55 / Chapter 5.1.4 --- Spectral distance measure --- p.57 / Chapter 5.2 --- Waveform concatenation method --- p.58 / Chapter 5.3 --- Selected examples of waveform concatenation --- p.59 / Chapter 5.3.1 --- I-I concatenation --- p.60 / Chapter 5.3.2 --- F-F concatenation --- p.66 / Chapter 5.4 --- Summary --- p.71 / References --- p.72 / Chapter 6. --- PERFORMANCE EVALUATION --- p.73 / Chapter 6.1 --- Listening test --- p.73 / Chapter 6.2 --- Test results: --- p.74 / Chapter 6.3 --- Discussions --- p.75 / References --- p.78 / Chapter 7. --- CONCLUSIONS & FUTURE WORKS --- p.79 / Chapter 7.1 --- Conclusions --- p.79 / Chapter 7.2 --- Suggested future work --- p.81 / APPENDIX 1 SYLLABLE DURATION --- p.82 / APPENDIX 2 PERCEPTUAL TEST PARAGRAPHS --- p.86
53

Domain-optimized Chinese speech generation.

January 2001 (has links)
Fung Tien Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 119-128). / Abstracts in English and Chinese. / Abstract --- p.1 / Acknowledgement --- p.1 / List of Figures --- p.7 / List of Tables --- p.11 / Chapter 1 --- Introduction --- p.14 / Chapter 1.1 --- General Trends on Speech Generation --- p.15 / Chapter 1.2 --- Domain-Optimized Speech Generation in Chinese --- p.16 / Chapter 1.3 --- Thesis Organization --- p.17 / Chapter 2 --- Background --- p.19 / Chapter 2.1 --- Linguistic and Phonological Properties of Chinese --- p.19 / Chapter 2.1.1 --- Articulation --- p.20 / Chapter 2.1.2 --- Tones --- p.21 / Chapter 2.2 --- Previous Development in Speech Generation --- p.22 / Chapter 2.2.1 --- Articulatory Synthesis --- p.23 / Chapter 2.2.2 --- Formant Synthesis --- p.24 / Chapter 2.2.3 --- Concatenative Synthesis --- p.25 / Chapter 2.2.4 --- Existing Systems --- p.31 / Chapter 2.3 --- Our Speech Generation Approach --- p.35 / Chapter 3 --- Corpus-based Syllable Concatenation: A Feasibility Test --- p.37 / Chapter 3.1 --- Capturing Syllable Coarticulation with Distinctive Features --- p.39 / Chapter 3.2 --- Creating a Domain-Optimized Wavebank --- p.41 / Chapter 3.2.1 --- Generate-and-Filter --- p.44 / Chapter 3.2.2 --- Waveform Segmentation --- p.47 / Chapter 3.3 --- The Use of Multi-Syllable Units --- p.49 / Chapter 3.4 --- Unit Selection for Concatenative Speech Output --- p.50 / Chapter 3.5 --- A Listening Test --- p.51 / Chapter 3.6 --- Chapter Summary --- p.52 / Chapter 4 --- Scalability and Portability to the Stocks Domain --- p.55 / Chapter 4.1 --- Complexity of the ISIS Responses --- p.56 / Chapter 4.2 --- XML for input semantic and grammar representation --- p.60 / Chapter 4.3 --- Tree-Based Filtering Algorithm --- p.63 / Chapter 4.4 --- Energy Normalization --- p.67 / Chapter 4.5 --- Chapter Summary --- p.69 / Chapter 5 --- Investigation in Tonal Contexts --- p.71 / Chapter 5.1 --- The Nature of Tones --- p.74 / Chapter 5.1.1 --- Human Perception of Tones --- p.75 / Chapter 5.2 --- Relative Importance of Left and Right Tonal Context --- p.77 / Chapter 5.2.1 --- Tonal Contexts in the Date-Time Subgrammar --- p.77 / Chapter 5.2.2 --- Tonal Contexts in the Numeric Subgrammar --- p.82 / Chapter 5.2.3 --- Conclusion regarding the Relative Importance of Left versus Right Tonal Contexts --- p.86 / Chapter 5.3 --- Selection Scheme for Tonal Variants --- p.86 / Chapter 5.3.1 --- Listening Test for our Tone Backoff Scheme --- p.90 / Chapter 5.3.2 --- Error Analysis --- p.92 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Contributions --- p.97 / Chapter 6.2 --- Future Directions --- p.98 / Chapter A --- Listening Test Questionnaire for FOREX Response Genera- tion --- p.100 / Chapter B --- Major Response Types For ISIS --- p.102 / Chapter C --- Recording Corpus for Tone Investigation in Date-time Sub- grammar --- p.105 / Chapter D --- Statistical Test for Left Tonal Context --- p.109 / Chapter E --- Statistical Test for Right Tonal Context --- p.112 / Chapter F --- Listening Test Questionnaire for Backoff Unit Selection Scheme --- p.115 / Chapter G --- Statistical Test for the Backoff Unit Selection Scheme --- p.117 / Chapter H --- Statistical Test for the Backoff Unit Selection Scheme --- p.118 / Bibliography --- p.119
54

Inverse solution of speech production based on perturbation theory and its application to articulatory speech synthesis. / CUHK electronic theses & dissertations collection

January 1998 (has links)
by Yu Zhenli. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (p. 193-202). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
55

Text-to-Speech Synthesis Using Found Data for Low-Resource Languages

Cooper, Erica Lindsay January 2019 (has links)
Text-to-speech synthesis is a key component of interactive, speech-based systems. Typically, building a high-quality voice requires collecting dozens of hours of speech from a single professional speaker in an anechoic chamber with a high-quality microphone. There are about 7,000 languages spoken in the world, and most do not enjoy the speech research attention historically paid to such languages as English, Spanish, Mandarin, and Japanese. Speakers of these so-called "low-resource languages" therefore do not equally benefit from these technological advances. While it takes a great deal of time and resources to collect a traditional text-to-speech corpus for a given language, we may instead be able to make use of various sources of "found'' data which may be available. In particular, sources such as radio broadcast news and ASR corpora are available for many languages. While this kind of data does not exactly match what one would collect for a more standard TTS corpus, it may nevertheless contain parts which are usable for producing natural and intelligible parametric TTS voices. In the first part of this thesis, we examine various types of found speech data in comparison with data collected for TTS, in terms of a variety of acoustic and prosodic features. We find that radio broadcast news in particular is a good match. Audiobooks may also be a good match despite their largely more expressive style, and certain speakers in conversational and read ASR corpora also resemble TTS speakers in their manner of speaking and thus their data may be usable for training TTS voices. In the rest of the thesis, we conduct a variety of experiments in training voices on non-traditional sources of data, such as ASR data, radio broadcast news, and audiobooks. We aim to discover which methods produce the most intelligible and natural-sounding voices, focusing on three main approaches: 1) Training data subset selection. In noisy, heterogeneous data sources, we may wish to locate subsets of the data that are well-suited for building voices, based on acoustic and prosodic features that are known to correspond with TTS-style speech, while excluding utterances that introduce noise or other artifacts. We find that choosing subsets of speakers for training data can result in voices that are more intelligible. 2) Augmenting the frontend feature set with new features. In cleaner sources of found data, we may wish to train voices on all of the data, but we may get improvements in naturalness by including acoustic and prosodic features at the frontend and synthesizing in a manner that better matches the TTS style. We find that this approach is promising for creating more natural-sounding voices, regardless of the underlying acoustic model. 3) Adaptation. Another way to make use of high-quality data while also including informative acoustic and prosodic features is to adapt to subsets, rather than to select and train only on subsets. We also experiment with training on mixed high- and low-quality data, and adapting towards the high-quality set, which produces more intelligible voices than training on either type of data by itself. We hope that our findings may serve as guidelines for anyone wishing to build their own TTS voice using non-traditional sources of found data.
56

Unit selection and waveform concatenation strategies in Cantonese text-to-speech.

January 2005 (has links)
Oey Sai Lok. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.1 / Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2 / Chapter 1.1.1 --- Text processing --- p.2 / Chapter 1.1.2 --- Acoustic synthesis --- p.3 / Chapter 1.1.3 --- Prosody modification --- p.4 / Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5 / Chapter 1.3 --- Objectives of this thesis --- p.7 / Chapter 1.4 --- Outline of the thesis --- p.9 / References --- p.11 / Chapter 2. --- Cantonese Speech --- p.13 / Chapter 2.1 --- The Cantonese dialect --- p.13 / Chapter 2.2 --- Phonology of Cantonese --- p.14 / Chapter 2.2.1 --- Initials --- p.15 / Chapter 2.2.2 --- Finals --- p.16 / Chapter 2.2.3 --- Tones --- p.18 / Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19 / References --- p.24 / Chapter 3. --- Cantonese Text-to-Speech --- p.25 / Chapter 3.1 --- General overview --- p.25 / Chapter 3.1.1 --- Text processing --- p.25 / Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26 / Chapter 3.1.3 --- Prosodic control --- p.27 / Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28 / Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29 / Chapter 3.3.1 --- Definition of sub-syllable units --- p.29 / Chapter 3.3.2 --- Acoustic inventory --- p.31 / Chapter 3.3.3 --- Determination of the concatenation points --- p.33 / Chapter 3.4 --- Problems --- p.34 / References --- p.36 / Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37 / Chapter 4.1 --- Previous work in concatenation methods --- p.37 / Chapter 4.1.1 --- Determination of concatenation point --- p.38 / Chapter 4.1.2 --- Waveform concatenation --- p.38 / Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39 / Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40 / Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42 / Chapter 4.3 --- General procedures in concatenation strategies --- p.44 / Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45 / Chapter 4.3.2 --- Concatenation of voiced segments --- p.45 / Chapter 4.3.3 --- Measurement of spectral distance --- p.48 / Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50 / Chapter 4.4.1 --- Unvoiced segments --- p.50 / Chapter 4.4.2 --- Voiced segments --- p.53 / Chapter 4.5 --- Selected examples in concatenation strategies --- p.58 / Chapter 4.5.1 --- Concatenation at Initial segments --- p.58 / Chapter 4.5.1.1 --- Plosives --- p.58 / Chapter 4.5.1.2 --- Fricatives --- p.59 / Chapter 4.5.2 --- Concatenation at Final segments --- p.60 / Chapter 4.5.2.1 --- V group (long vowel) --- p.60 / Chapter 4.5.2.2 --- D group (diphthong) --- p.61 / References --- p.63 / Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65 / Chapter 5.1 --- Basic requirements in unit selection process --- p.65 / Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65 / Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66 / Chapter 5.1.1.2 --- Statistics on the availability --- p.67 / Chapter 5.1.2 --- Variations in acoustic parameters --- p.70 / Chapter 5.1.2.1 --- Pitch level --- p.71 / Chapter 5.1.2.2 --- Duration --- p.74 / Chapter 5.1.2.3 --- Intensity level --- p.75 / Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77 / Chapter 5.2.1 --- Multiple copies found --- p.79 / Chapter 5.2.2 --- Unique copy found --- p.79 / Chapter 5.2.3 --- No matched copy found --- p.80 / Chapter 5.2.4 --- Illustrative examples --- p.80 / Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81 / References --- p.88 / Chapter 6. --- Performance Evaluation --- p.89 / Chapter 6.1 --- General information --- p.90 / Chapter 6.1.1 --- Objective test --- p.90 / Chapter 6.1.2 --- Subjective test --- p.90 / Chapter 6.1.3 --- Test materials --- p.91 / Chapter 6.2 --- Details of the objective test --- p.92 / Chapter 6.2.1 --- Testing method --- p.92 / Chapter 6.2.2 --- Results --- p.93 / Chapter 6.2.3 --- Analysis --- p.96 / Chapter 6.3 --- Details of the subjective test --- p.98 / Chapter 6.3.1 --- Testing method --- p.98 / Chapter 6.3.2 --- Results --- p.99 / Chapter 6.3.3 --- Analysis --- p.101 / Chapter 6.4 --- Summary --- p.107 / References --- p.108 / Chapter 7. --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Suggested future works --- p.111 / References --- p.113 / Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114 / Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121 / Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124 / Appendix 4 Test word used in performance evaluation --- p.127 / Appendix 5 Test paragraph used in performance evaluation --- p.128 / Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131 / Appendix 7 Duration model used in Text-to-Speech system --- p.132
57

Phone-based speech synthesis using neural network with articulatory control.

January 1996 (has links)
by Lo Wai Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 151-160). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Applications of Speech Synthesis --- p.2 / Chapter 1.1.1 --- Human Machine Interface --- p.2 / Chapter 1.1.2 --- Speech Aids --- p.3 / Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4 / Chapter 1.1.4 --- Speech Dialogue System --- p.4 / Chapter 1.2 --- Current Status in Speech Synthesis --- p.6 / Chapter 1.2.1 --- Concatenation Based --- p.6 / Chapter 1.2.2 --- Parametric Based --- p.7 / Chapter 1.2.3 --- Articulatory Based --- p.7 / Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8 / Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9 / Chapter 1.3.1 --- Motivation --- p.9 / Chapter 1.3.2 --- Objectives --- p.9 / Chapter 1.4 --- Thesis outline --- p.11 / Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12 / Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12 / Chapter 2.2 --- Basic Phonology and Phonetics --- p.14 / Chapter 2.2.1 --- Phonology --- p.14 / Chapter 2.2.2 --- Phonetics --- p.15 / Chapter 2.2.3 --- Prosody --- p.16 / Chapter 2.3 --- Transcription Systems --- p.17 / Chapter 2.3.1 --- The Employed Transcription System --- p.18 / Chapter 2.4 --- Cantonese Phonology --- p.20 / Chapter 2.4.1 --- Some Properties of Cantonese --- p.20 / Chapter 2.4.2 --- Initial --- p.21 / Chapter 2.4.3 --- Final --- p.23 / Chapter 2.4.4 --- Lexical Tone --- p.25 / Chapter 2.4.5 --- Variations --- p.26 / Chapter 2.5 --- The Vowel Quadrilaterals --- p.29 / Chapter 3 --- Speech Synthesis Technology --- p.32 / Chapter 3.1 --- The Human Speech Production --- p.32 / Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34 / Chapter 3.2.1 --- Controllability --- p.34 / Chapter 3.2.2 --- Naturalness --- p.34 / Chapter 3.2.3 --- Complexity --- p.35 / Chapter 3.2.4 --- Information Storage --- p.35 / Chapter 3.3 --- Units for Synthesis --- p.37 / Chapter 3.4 --- Type of Synthesizer --- p.40 / Chapter 3.4.1 --- Copy Concatenation --- p.40 / Chapter 3.4.2 --- Vocoder --- p.41 / Chapter 3.4.3 --- Articulatory Synthesis --- p.44 / Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47 / Chapter 4.1 --- Neural Network Approximation --- p.48 / Chapter 4.1.1 --- The Approximation Problem --- p.48 / Chapter 4.1.2 --- Network Approach for Approximation --- p.49 / Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53 / Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53 / Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56 / Chapter 4.2.3 --- Radial Basis Function Network --- p.58 / Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59 / Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61 / Chapter 4.3.1 --- Implicit Template Storage --- p.61 / Chapter 4.3.2 --- Articulatory Control Parameters --- p.61 / Chapter 4.4 --- Summary --- p.65 / Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66 / Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66 / Chapter 5.1.1 --- Network Architectures --- p.68 / Chapter 5.1.2 --- Spectral Templates for Training --- p.74 / Chapter 5.1.3 --- System requirement --- p.76 / Chapter 5.2 --- Subjective Listening Test --- p.79 / Chapter 5.2.1 --- Sample Selection --- p.79 / Chapter 5.2.2 --- Test Procedure --- p.81 / Chapter 5.2.3 --- Result --- p.83 / Chapter 5.2.4 --- Analysis --- p.86 / Chapter 5.3 --- Summary --- p.88 / Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89 / Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90 / Chapter 6.1.1 --- Acoustic Effect --- p.90 / Chapter 6.1.2 --- Prosodic Effect --- p.91 / Chapter 6.2 --- Control in various Synthesis Techniques --- p.92 / Chapter 6.2.1 --- Copy Concatenation --- p.92 / Chapter 6.2.2 --- Formant Synthesis --- p.93 / Chapter 6.2.3 --- Articulatory synthesis --- p.93 / Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94 / Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95 / Chapter 6.4 --- Voice Correspondence : --- p.97 / Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98 / Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101 / Chapter 6.5 --- Summary --- p.108 / Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109 / Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110 / Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111 / Chapter 7.2.1 --- Speech Material Selection --- p.111 / Chapter 7.2.2 --- Experimental Procedure --- p.112 / Chapter 7.2.3 --- Result --- p.114 / Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117 / Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117 / Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119 / Chapter 7.3.3 --- General Observations --- p.119 / Chapter 7.3.4 --- Other Observations --- p.121 / Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124 / Chapter 7.5 --- Summary --- p.126 / Chapter 8 --- Conclusion and Further Work --- p.127 / Chapter 8.1 --- Conclusion --- p.127 / Chapter 8.2 --- Further Extension Work --- p.130 / Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130 / Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130 / Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131 / Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131 / Chapter A --- Cantonese Initials and Finals --- p.132 / Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132 / Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135 / Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135 / Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137 / Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138 / Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138 / Chapter D --- Phrase Lists --- p.140 / Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140 / Chapter D.1.1 --- 兩字詞 --- p.140 / Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144 / Chapter D.2.1 --- 片語 --- p.144
58

A facial animation model for expressive audio-visual speech

Somasundaram, Arunachalam. January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 131-139).
59

Audio browsing of automaton-based hypertext

Ustun, Selen 30 September 2004 (has links)
With the wide-spread adoption of hypermedia systems and the World Wide Web (WWW) in particular, these systems have evolved from simple systems with only textual content to those that incorporate a large content base, which consists of a wide variety of document types. Also, with the increase in the number of users, there has grown a need for these systems to be accessible to a wider range of users. Consequently, the growth of the systems along with the number and variety of users require new presentation and navigation mechanisms for a wider audience. One of the new presentation methods is the audio-only presentation of hypertext content and this research proposes a novel solution to this problem for complex and dynamic systems. The hypothesis is that the proposed Audio Browser is an efficient tool for presenting hypertext in audio format, which will prove to be useful for several applications including browsers for visually-impaired and remote users. The Audio Browser provides audio-only browsing of contents in a Petri-based hypertext system called Context-Aware Trellis (caT). It uses a combination of synthesized speech and pre-recorded speech to allow its user to listen to contents of documents, follow links, and get information about the navigation process. It also has mechanisms for navigating within documents in order to allow users to view contents more quickly.
60

Acoustic Models for the Analysis and Synthesis of the Singing Voice

Lee, Matthew E. 26 April 2005 (has links)
Throughout our history, the singing voice has been a fundamental tool for musical expression. While analysis and digital synthesis techniques have been developed for normal speech, few models and techniques have been focused on the singing voice. The central theme of this research is the development of models aimed at the characterization and synthesis of the singing voice. First, a spectral model is presented in which asymmetric generalized Gaussian functions are used to represent the formant structure of a singing voice in a flexible manner. Efficient methods for searching the parameter space are investigated and challenges associated with smooth parameter trajectories are discussed. Next a model for glottal characterization is introduced by first presenting an analysis of the relationship between measurable spectral qualities of the glottal waveform and perceptually relevant time-domain parameters. A mathematical derivation of this relationship is presented and is extended as a method for parameter estimation. These concepts are then used to outline a procedure for modifying glottal textures and qualities in the frequency domain. By combining these models with the Analysis-by-Synthesis/Overlap-Add sinusoidal model, the spectral and glottal models are shown to be capable of characterizing the singing voice according to traits such as level of training and registration. An application is presented in which these parameterizations are used to implement a system for singing voice enhancement. Subjective listening tests were conducted in which listeners showed an overall preference for outputs produced by the proposed enhancement system over both unmodified voices and voices enhanced with competitive methods.

Page generated in 0.0506 seconds