Global ETD Search

151	Visualization of the multi-dimensional speech parameter space. January 1993 (has links) by Andrew Poon Ngai Ho. / Thesis (M.S.)--Chinese University of Hong Kong, 1993. / Includes bibliographical references (leaves [97-98]). / ABSTRACT / ACKNOWLEDGMENTS / Chapter 1. --- INTRODUCTION / Chapter 2. --- REPRESENTATION OP SPEECH DATA --- p.4 / Chapter 2.1 --- SAMPLE DATA REPRESENTATION --- p.4 / Chapter 2.2 --- ANALOG LINEAR SYSTEM MODEL --- p.7 / Chapter 2.3 --- DISCRETE FOURIER TRANSFORM --- p.8 / Chapter 2.4 --- FILTER BAND REPRESENTATION --- p.8 / Chapter 2.5 --- LINEAR PREDICTIVE CODING (LPC) --- p.10 / Chapter 2.2 --- LPC CEPSTRAL COEFFICIENT --- p.13 / Chapter 3. --- MULTI-DIMENSIONAL ANALYSIS --- p.18 / Chapter 3.1 --- PURE GRAPHICAL TOOLS --- p.18 / Chapter 3.1.1 --- MULTI-HISTOGRAM --- p.18 / Chapter 3.1.2 --- STARS --- p.19 / Chapter 3.1.3 --- SPIKED SCATTERPLOT --- p.19 / Chapter 3.1.4 --- GLYPHS --- p.22 / Chapter 3.1.5 --- BOXES --- p.22 / Chapter 3.1.6 --- LIMITATIONS OF THE BASIC METHODS --- p.22 / Chapter 3.1.7 --- CHERNOFF FACES --- p.26 / Chapter 3.1.8 --- ANDREW'S CURVE --- p.27 / Chapter 3.1.9 --- LIMITATIONS OF CHERNOFF FACES AND ANDREW'S CURVE --- p.30 / Chapter 3.1.10 --- SCATTERED PLOT MATRIX --- p.30 / Chapter 3.1.11 --- PARALLEL-AXIS SYSTEM --- p.32 / Chapter 3.1.12 --- COMMON BASIC PITFALL --- p.33 / Chapter 3.2 --- PURE PROJECTION METHODS --- p.36 / Chapter 3.2.1 --- PRINCIPAL COMPONENTS ANALYSIS --- p.36 / Chapter 3.2.2 --- PRINCIPLE CO-ORDINATES ANALYSIS --- p.37 / Chapter 3.2.3 --- REGRESSION ANALYSIS --- p.38 / Chapter 3.3 --- SLICED INVERSE REGRESSION (SIR) --- p.41 / Chapter 4 --- DATA ANALYSIS --- p.50 / Chapter 4.1 --- PROGRAMS AND TEST DATA --- p.50 / Chapter 4.2 --- ACTUAL SPEECH DATA RESULTS --- p.63 / Chapter 4.2.1 --- "SINGLE UTTERANCE OF ""4"" BY SPEAKER A ONLY" --- p.66 / Chapter 4.2.2 --- "TWELVE UTTERANCES OF ""4"" BY SPEAKER A" --- p.72 / Chapter 4.2.3 --- "THREE UTTERANCES PER SPEAKER OF ""4"" BY SPEAKER A, B AND C" --- p.78 / Chapter 4.2.4 --- "TWO UTTERANCES PER DIGIT OF ""1"" TO ""9"" BY SPEAKER A" --- p.83 / Chapter 4.2.5 --- "ONE UTTERANCE PER DIGIT PER SPEAKER OF ""1"" TO ""9"" BY SPEAKER A,B,C" --- p.86 / CONCLUSION AND FURTHER WORKS --- p.93 / Chapter 5.1 --- CONCLUSION --- p.93 / Chapter 5.2 --- FURTHER WORKS --- p.94 / REFERENCES / APPENDIX I MATLAB PROGRAM LISTING FOR SIR / APPENDIX 2 C PROGRAM LISTING FOR ROTATIONAL VIEW / APPENDIX 3 C PROGRAM LISTING FOR LPC AND CEPSTRAL TRANSFORMS / "APPENDIX 4 ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR SINGLE UTTERANCE OF ""4"" BY SPEAKER A" / "APPENDIX 5 ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR 12 UTTERANCES OF ""4"" BY SPEAKER A" / "APPENDIX 6 ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR 5 UTTERANCES PER SPEAKER OF ""4"" BY SPEAKER A,B,C" / "APPENDIX 7 ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR 2 UTTERANCES PER DIGIT OF DIGIT ""l"" TO ""9"" BY SPEAKER A" / "APPENDIX 8 ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR 1UTTERANCE PER SPEAKER PER DIGIT OF ""1"" TO ""9"" BY SPEAKER A,B,C" Speech processing systems Three-dimensional display systems Visualization
152	low bit rate speech coder based on waveform interpolation =: 基於波形預測方法的低比特率語音編碼. / 基於波形預測方法的低比特率語音編碼 / A low bit rate speech coder based on waveform interpolation =: Ji yu bo xing yu ce fang fa de di bi te lu yu yin bian ma. / Ji yu bo xing yu ce fang fa de di bi te lu yu yin bian ma January 1999 (has links) by Ge Gao. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 101-107). / Text in English; abstracts in English and Chinese. / by Ge Gao. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Attributes of speech coders --- p.1 / Chapter 1.1.1 --- Bit rate --- p.2 / Chapter 1.1.2 --- Speech quality --- p.3 / Chapter 1.1.3 --- Complexity --- p.3 / Chapter 1.1.4 --- Delay --- p.4 / Chapter 1.1.5 --- Channel-error sensitivity --- p.4 / Chapter 1.2 --- Development of speech coding techniques --- p.5 / Chapter 1.3 --- Motivations and objectives --- p.7 / Chapter 2 --- Waveform interpolation speech model --- p.9 / Chapter 2.1 --- Overview of speech production model --- p.9 / Chapter 2.2 --- Linear prediction(LP) --- p.11 / Chapter 2.3 --- Linear-prediction based analysis-by-synthesis coding(LPAS) --- p.14 / Chapter 2.4 --- Sinusoidal model --- p.15 / Chapter 2.5 --- Mixed Excitation Linear Prediction(MELP) model --- p.16 / Chapter 2.6 --- Waveform interpolation model --- p.16 / Chapter 2.6.1 --- Principles of waveform interpolation model --- p.18 / Chapter 2.6.2 --- Outline of a WI coding system --- p.25 / Chapter 3 --- Pitch detection --- p.31 / Chapter 3.1 --- Overview of existing pitch detection methods --- p.31 / Chapter 3.2 --- Robust Algorithm for Pitch Tracking(RAPT) --- p.33 / Chapter 3.3 --- Modifications of RAPT --- p.37 / Chapter 4 --- Development of a 1.7kbps speech coder --- p.44 / Chapter 4.1 --- Architecture of the coder --- p.44 / Chapter 4.2 --- Encoding of unvoiced speech --- p.46 / Chapter 4.3 --- Encoding of voiced speech --- p.46 / Chapter 4.3.1 --- Generation of PCW --- p.48 / Chapter 4.3.2 --- Variable Dimensional Vector Quantization(VDVQ) --- p.53 / Chapter 4.3.3 --- Sparse frequency representation(SFR) of speech --- p.56 / Chapter 4.3.4 --- Sample selective linear prediction (SSLP) --- p.58 / Chapter 4.4 --- Practical implementation issues --- p.60 / Chapter 5 --- Development of a 2.0kbps speech coder --- p.67 / Chapter 5.1 --- Features of the coder --- p.67 / Chapter 5.2 --- Postfiltering --- p.75 / Chapter 5.3 --- Voice activity detection(VAD) --- p.76 / Chapter 5.4 --- Performance evaluation --- p.79 / Chapter 6 --- Conclusion --- p.85 / Chapter A --- Subroutine for pitch detection algorithm --- p.88 / Chapter B --- Subroutines for Pitch Cycle Waveform(PCW) generation --- p.96 / Chapter B.1 --- The main subroutine --- p.96 / Chapter B.2 --- Subroutine for peak picking algorithm --- p.98 / Chapter B.3 --- Subroutine for encoding the residue (using VDVQ) --- p.99 / Chapter B.4 --- Subroutine for synthesizing PCW from its residue --- p.100 / Bibliography --- p.101 Speech processing systems Coding theory Signal processing--Digital techniques
153	Cantonese text-to-speech synethesis using sub-syllable units. / 利用子音節的粤語文語轉換系統 / Cantonese text-to-speech synethesis using sub-syllable units. / Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong January 2001 (has links) Law Ka Man = 利用子音節的粤語文語轉換系統 / 羅家文. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Law Ka Man = Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong / Luo Jiawen. / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Text analysis --- p.2 / Chapter 1.2 --- Prosody prediction --- p.3 / Chapter 1.3 --- Speech generation --- p.3 / Chapter 1.4 --- The trend of TTS technology --- p.5 / Chapter 1.5 --- TTS systems for different languages --- p.6 / Chapter 1.6 --- Objectives of the thesis --- p.8 / Chapter 1.7 --- Thesis outline --- p.8 / References --- p.10 / Chapter 2. --- BACKGROUND --- p.11 / Chapter 2.1 --- Cantonese phonology --- p.11 / Chapter 2.2 --- Cantonese TTS - a baseline system --- p.16 / Chapter 2.3 --- Time-Domain Prrch-Synchronous-OverLap-Add --- p.17 / Chapter 2.3.1 --- "From, speech signal to short-time analysis signals" --- p.18 / Chapter 2.3.2 --- From short-time analysis signals to short-time synthesis signals --- p.19 / Chapter 2.3.3 --- From short-time synthesis signals to synthetic speech --- p.20 / Chapter 2.4 --- Time-scale and Pitch-scale modifications --- p.20 / Chapter 2.4.1 --- Voiced speech --- p.20 / Chapter 2.4.2 --- Unvoiced speech --- p.21 / Chapter 2.5 --- Summary --- p.22 / References --- p.23 / Chapter 3. --- SUB-SYLLABLE BASED TTS SYSTEM --- p.24 / Chapter 3.1 --- Motivations --- p.24 / Chapter 3.2 --- Choices of synthesis units --- p.27 / Chapter 3.2.1 --- Sub-syllable unit --- p.29 / Chapter 3.2.2 --- "Diphones, demi-syllables and sub-syllable units" --- p.31 / Chapter 3.3 --- Proposed TTS system --- p.32 / Chapter 3.3.1 --- Text analysis module --- p.33 / Chapter 3.3.2 --- Synthesis module --- p.36 / Chapter 3.3.3 --- Prosody module --- p.37 / Chapter 3.4 --- Summary --- p.38 / References --- p.39 / Chapter 4. --- ACOUSTIC INVENTORY --- p.40 / Chapter 4.1 --- The full set of Cantonese sub-syllable units --- p.40 / Chapter 4.2 --- A reduced set of sub-syllable units --- p.42 / Chapter 4.3 --- Corpus design --- p.44 / Chapter 4.4 --- Recording --- p.46 / Chapter 4.5 --- Post-processing of speech data --- p.47 / Chapter 4.6 --- Summary --- p.51 / References --- p.51 / Chapter 5. --- CONCATENATION TECHNIQUES --- p.52 / Chapter 5.1 --- Concatenation of sub-syllable units --- p.52 / Chapter 5.1.1 --- Concatenation of plosives and affricates --- p.54 / Chapter 5.1.2 --- Concatenation of fricatives --- p.55 / Chapter 5.1.3 --- "Concatenation of vowels, semi-vowels and nasals" --- p.55 / Chapter 5.1.4 --- Spectral distance measure --- p.57 / Chapter 5.2 --- Waveform concatenation method --- p.58 / Chapter 5.3 --- Selected examples of waveform concatenation --- p.59 / Chapter 5.3.1 --- I-I concatenation --- p.60 / Chapter 5.3.2 --- F-F concatenation --- p.66 / Chapter 5.4 --- Summary --- p.71 / References --- p.72 / Chapter 6. --- PERFORMANCE EVALUATION --- p.73 / Chapter 6.1 --- Listening test --- p.73 / Chapter 6.2 --- Test results： --- p.74 / Chapter 6.3 --- Discussions --- p.75 / References --- p.78 / Chapter 7. --- CONCLUSIONS & FUTURE WORKS --- p.79 / Chapter 7.1 --- Conclusions --- p.79 / Chapter 7.2 --- Suggested future work --- p.81 / APPENDIX 1 SYLLABLE DURATION --- p.82 / APPENDIX 2 PERCEPTUAL TEST PARAGRAPHS --- p.86 Speech synthesis Speech processing systems Cantonese dialects--Data processing
154	Domain-optimized Chinese speech generation. January 2001 (has links) Fung Tien Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 119-128). / Abstracts in English and Chinese. / Abstract --- p.1 / Acknowledgement --- p.1 / List of Figures --- p.7 / List of Tables --- p.11 / Chapter 1 --- Introduction --- p.14 / Chapter 1.1 --- General Trends on Speech Generation --- p.15 / Chapter 1.2 --- Domain-Optimized Speech Generation in Chinese --- p.16 / Chapter 1.3 --- Thesis Organization --- p.17 / Chapter 2 --- Background --- p.19 / Chapter 2.1 --- Linguistic and Phonological Properties of Chinese --- p.19 / Chapter 2.1.1 --- Articulation --- p.20 / Chapter 2.1.2 --- Tones --- p.21 / Chapter 2.2 --- Previous Development in Speech Generation --- p.22 / Chapter 2.2.1 --- Articulatory Synthesis --- p.23 / Chapter 2.2.2 --- Formant Synthesis --- p.24 / Chapter 2.2.3 --- Concatenative Synthesis --- p.25 / Chapter 2.2.4 --- Existing Systems --- p.31 / Chapter 2.3 --- Our Speech Generation Approach --- p.35 / Chapter 3 --- Corpus-based Syllable Concatenation: A Feasibility Test --- p.37 / Chapter 3.1 --- Capturing Syllable Coarticulation with Distinctive Features --- p.39 / Chapter 3.2 --- Creating a Domain-Optimized Wavebank --- p.41 / Chapter 3.2.1 --- Generate-and-Filter --- p.44 / Chapter 3.2.2 --- Waveform Segmentation --- p.47 / Chapter 3.3 --- The Use of Multi-Syllable Units --- p.49 / Chapter 3.4 --- Unit Selection for Concatenative Speech Output --- p.50 / Chapter 3.5 --- A Listening Test --- p.51 / Chapter 3.6 --- Chapter Summary --- p.52 / Chapter 4 --- Scalability and Portability to the Stocks Domain --- p.55 / Chapter 4.1 --- Complexity of the ISIS Responses --- p.56 / Chapter 4.2 --- XML for input semantic and grammar representation --- p.60 / Chapter 4.3 --- Tree-Based Filtering Algorithm --- p.63 / Chapter 4.4 --- Energy Normalization --- p.67 / Chapter 4.5 --- Chapter Summary --- p.69 / Chapter 5 --- Investigation in Tonal Contexts --- p.71 / Chapter 5.1 --- The Nature of Tones --- p.74 / Chapter 5.1.1 --- Human Perception of Tones --- p.75 / Chapter 5.2 --- Relative Importance of Left and Right Tonal Context --- p.77 / Chapter 5.2.1 --- Tonal Contexts in the Date-Time Subgrammar --- p.77 / Chapter 5.2.2 --- Tonal Contexts in the Numeric Subgrammar --- p.82 / Chapter 5.2.3 --- Conclusion regarding the Relative Importance of Left versus Right Tonal Contexts --- p.86 / Chapter 5.3 --- Selection Scheme for Tonal Variants --- p.86 / Chapter 5.3.1 --- Listening Test for our Tone Backoff Scheme --- p.90 / Chapter 5.3.2 --- Error Analysis --- p.92 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Contributions --- p.97 / Chapter 6.2 --- Future Directions --- p.98 / Chapter A --- Listening Test Questionnaire for FOREX Response Genera- tion --- p.100 / Chapter B --- Major Response Types For ISIS --- p.102 / Chapter C --- Recording Corpus for Tone Investigation in Date-time Sub- grammar --- p.105 / Chapter D --- Statistical Test for Left Tonal Context --- p.109 / Chapter E --- Statistical Test for Right Tonal Context --- p.112 / Chapter F --- Listening Test Questionnaire for Backoff Unit Selection Scheme --- p.115 / Chapter G --- Statistical Test for the Backoff Unit Selection Scheme --- p.117 / Chapter H --- Statistical Test for the Backoff Unit Selection Scheme --- p.118 / Bibliography --- p.119 Speech synthesis Speech processing systems Chinese language--Data processing
155	Unsupervised model adaptation for continuous speech recognition using model-level confidence measures. January 2002 (has links) Kwan Ka Yan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.1 / Chapter 1.1. --- Automatic Speech Recognition --- p.1 / Chapter 1.2. --- Robustness of ASR Systems --- p.3 / Chapter 1.3. --- Model Adaptation for Robust ASR --- p.4 / Chapter 1.4. --- Thesis outline --- p.6 / References --- p.8 / Chapter 2. --- Fundamentals of Continuous Speech Recognition --- p.10 / Chapter 2.1. --- Acoustic Front-End --- p.10 / Chapter 2.2. --- Recognition Module --- p.11 / Chapter 2.2.1. --- Acoustic Modeling with HMM --- p.12 / Chapter 2.2.2. --- Basic Phonology of Cantonese --- p.14 / Chapter 2.2.3. --- Acoustic Modeling for Cantonese --- p.15 / Chapter 2.2.4. --- Language Modeling --- p.16 / References --- p.17 / Chapter 3. --- Unsupervised Model Adaptation --- p.18 / Chapter 3.1. --- A General Review of Model Adaptation --- p.18 / Chapter 3.1.1. --- Supervised and Unsupervised Adaptation --- p.20 / Chapter 3.1.2. --- N-Best Adaptation --- p.22 / Chapter 3.2. --- MAP --- p.23 / Chapter 3.3. --- MLLR --- p.25 / Chapter 3.3.1. --- Adaptation Approach --- p.26 / Chapter 3.3.2. --- Estimation of MLLR regression matrices --- p.27 / Chapter 3.3.3. --- Least Mean Squares Regression --- p.29 / Chapter 3.3.4. --- Number of Transformations --- p.30 / Chapter 3.4. --- Experiment Results --- p.32 / Chapter 3.4.1. --- Standard MLLR versus LMS MLLR --- p.36 / Chapter 3.4.2. --- Effect of the Number of Transformations --- p.43 / Chapter 3.4.3. --- MAP Vs. MLLR --- p.46 / Chapter 3.5. --- Conclusions --- p.48 / Referencesxlix / Chapter 4. --- Use of Confidence Measure for MLLR based Adaptation --- p.50 / Chapter 4.1. --- Introduction to Confidence Measure --- p.50 / Chapter 4.2. --- Confidence Measure Based on Word Density --- p.51 / Chapter 4.3. --- Model-level confidence measure --- p.53 / Chapter 4.4. --- Integrating Confusion Information into Confidence Measure --- p.55 / Chapter 4.5. --- Adaptation Data Distributions in Different Confidence Measures..… --- p.57 / References --- p.65 / Chapter 5. --- Experimental Results and Analysis --- p.66 / Chapter 5.1. --- Supervised Adaptation --- p.67 / Chapter 5.2. --- Cheated Confidence Measure --- p.69 / Chapter 5.3. --- Confidence Measures of Different Levels --- p.71 / Chapter 5.4. --- Incorporation of Confusion Matrix --- p.81 / Chapter 5.5. --- Conclusions --- p.83 / Chapter 6. --- Conclusions --- p.35 / Chapter 6.1. --- Future Works --- p.88 Automatic speech recognition Regression analysis
156	Unsupervised neural and Bayesian models for zero-resource speech processing Kamper, Herman January 2017 (has links) Zero-resource speech processing is a growing research area which aims to develop methods that can discover linguistic structure and representations directly from unlabelled speech audio. Such unsupervised methods would allow speech technology to be developed in settings where transcriptions, pronunciation dictionaries, and text for language modelling are not available. Similar methods are required for cognitive models of language acquisition in human infants, and for developing robotic applications that are able to automatically learn language in a novel linguistic environment. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. The claim of this thesis is that both top-down modelling (using knowledge of higher-level units to to learn, discover and gain insight into their lower-level constituents) as well as bottom-up modelling (piecing together lower-level features to give rise to more complex higher-level structures) are advantageous in tackling these two problems. The thesis is divided into three parts. The first part introduces a new autoencoder-like deep neural network for unsupervised frame-level representation learning. This correspondence autoencoder (cAE) uses weak top-down supervision from an unsupervised term discovery system that identifies noisy word-like terms in unlabelled speech data. In an intrinsic evaluation of frame-level representations, the cAE outperforms several state-of-the-art bottom-up and top-down approaches, achieving a relative improvement of more than 60% over the previous best system. This shows that the cAE is particularly effective in using top-down knowledge of longer-spanning patterns in the data; at the same time, we find that the cAE is only able to learn useful representations when it is initialized using bottom-up pretraining on a large set of unlabelled speech. The second part of the thesis presents a novel unsupervised segmental Bayesian model that segments unlabelled speech data and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types\|the system essentially performs unsupervised speech recognition. In this approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this embedding space while jointly performing segmentation. We first evaluate the approach in a small-vocabulary multi-speaker connected digit recognition task, where we report unsupervised word error rates (WER) by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% WER, outperforming a previous HMM-based system by about 10% absolute. To achieve this performance, the acoustic word embedding function (which maps variable-duration segments to single vectors) is refined in a top-down manner by using terms discovered by the model in an outer loop of segmentation. The third and final part of the study extends the small-vocabulary system in order to handle larger vocabularies in conversational speech data. To our knowledge, this is the first full-coverage segmentation and clustering system that is applied to large-vocabulary multi-speaker data. To improve efficiency, the system incorporates a bottom-up syllable boundary detection method to eliminate unlikely word boundaries. We compare the system on English and Xitsonga datasets to several state-of-the-art baselines. We show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using features from the cAE (which incorporates both top-down and bottom-up learning). The system's discovered clusters are still less pure than those of two multi-speaker unsupervised term discovery systems, but provide far greater coverage. In summary, the different models and systems presented in this thesis show that both top-down and bottom-up modelling can improve representation learning, segmentation and clustering of unlabelled speech data. 006.4
157	An electronic device to reduce the dynamic range of speech Hildebrant, Eric Michael January 1982 (has links) Thesis (B.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING / Bibliography: leaves 90-92. / by Eric Michael Hildebrant. / B.S. Speech processing systems Psychoacoustics Speech, Intelligibility of
158	An investigation of vowel formant tracks for purposes of speaker identification. Goldstein, Ursula Gisela January 1975 (has links) Thesis. 1975. M.S.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / Bibliography: leaves 221-224. / M.S. English language Vowels Speech processing systems
159	Unit selection and waveform concatenation strategies in Cantonese text-to-speech. January 2005 (has links) Oey Sai Lok. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.1 / Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2 / Chapter 1.1.1 --- Text processing --- p.2 / Chapter 1.1.2 --- Acoustic synthesis --- p.3 / Chapter 1.1.3 --- Prosody modification --- p.4 / Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5 / Chapter 1.3 --- Objectives of this thesis --- p.7 / Chapter 1.4 --- Outline of the thesis --- p.9 / References --- p.11 / Chapter 2. --- Cantonese Speech --- p.13 / Chapter 2.1 --- The Cantonese dialect --- p.13 / Chapter 2.2 --- Phonology of Cantonese --- p.14 / Chapter 2.2.1 --- Initials --- p.15 / Chapter 2.2.2 --- Finals --- p.16 / Chapter 2.2.3 --- Tones --- p.18 / Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19 / References --- p.24 / Chapter 3. --- Cantonese Text-to-Speech --- p.25 / Chapter 3.1 --- General overview --- p.25 / Chapter 3.1.1 --- Text processing --- p.25 / Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26 / Chapter 3.1.3 --- Prosodic control --- p.27 / Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28 / Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29 / Chapter 3.3.1 --- Definition of sub-syllable units --- p.29 / Chapter 3.3.2 --- Acoustic inventory --- p.31 / Chapter 3.3.3 --- Determination of the concatenation points --- p.33 / Chapter 3.4 --- Problems --- p.34 / References --- p.36 / Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37 / Chapter 4.1 --- Previous work in concatenation methods --- p.37 / Chapter 4.1.1 --- Determination of concatenation point --- p.38 / Chapter 4.1.2 --- Waveform concatenation --- p.38 / Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39 / Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40 / Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42 / Chapter 4.3 --- General procedures in concatenation strategies --- p.44 / Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45 / Chapter 4.3.2 --- Concatenation of voiced segments --- p.45 / Chapter 4.3.3 --- Measurement of spectral distance --- p.48 / Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50 / Chapter 4.4.1 --- Unvoiced segments --- p.50 / Chapter 4.4.2 --- Voiced segments --- p.53 / Chapter 4.5 --- Selected examples in concatenation strategies --- p.58 / Chapter 4.5.1 --- Concatenation at Initial segments --- p.58 / Chapter 4.5.1.1 --- Plosives --- p.58 / Chapter 4.5.1.2 --- Fricatives --- p.59 / Chapter 4.5.2 --- Concatenation at Final segments --- p.60 / Chapter 4.5.2.1 --- V group (long vowel) --- p.60 / Chapter 4.5.2.2 --- D group (diphthong) --- p.61 / References --- p.63 / Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65 / Chapter 5.1 --- Basic requirements in unit selection process --- p.65 / Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65 / Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66 / Chapter 5.1.1.2 --- Statistics on the availability --- p.67 / Chapter 5.1.2 --- Variations in acoustic parameters --- p.70 / Chapter 5.1.2.1 --- Pitch level --- p.71 / Chapter 5.1.2.2 --- Duration --- p.74 / Chapter 5.1.2.3 --- Intensity level --- p.75 / Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77 / Chapter 5.2.1 --- Multiple copies found --- p.79 / Chapter 5.2.2 --- Unique copy found --- p.79 / Chapter 5.2.3 --- No matched copy found --- p.80 / Chapter 5.2.4 --- Illustrative examples --- p.80 / Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81 / References --- p.88 / Chapter 6. --- Performance Evaluation --- p.89 / Chapter 6.1 --- General information --- p.90 / Chapter 6.1.1 --- Objective test --- p.90 / Chapter 6.1.2 --- Subjective test --- p.90 / Chapter 6.1.3 --- Test materials --- p.91 / Chapter 6.2 --- Details of the objective test --- p.92 / Chapter 6.2.1 --- Testing method --- p.92 / Chapter 6.2.2 --- Results --- p.93 / Chapter 6.2.3 --- Analysis --- p.96 / Chapter 6.3 --- Details of the subjective test --- p.98 / Chapter 6.3.1 --- Testing method --- p.98 / Chapter 6.3.2 --- Results --- p.99 / Chapter 6.3.3 --- Analysis --- p.101 / Chapter 6.4 --- Summary --- p.107 / References --- p.108 / Chapter 7. --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Suggested future works --- p.111 / References --- p.113 / Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114 / Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121 / Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124 / Appendix 4 Test word used in performance evaluation --- p.127 / Appendix 5 Test paragraph used in performance evaluation --- p.128 / Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131 / Appendix 7 Duration model used in Text-to-Speech system --- p.132 Speech synthesis Speech processing systems Cantonese dialects--Data processing
160	Speech synthesis from surface electromyogram signals. / CUHK electronic theses & dissertations collection January 2006 (has links) A method for synthesizing speech from surface electromyogram (SEMG) signals in a frame-by-frame basis is presented. The input SEMG signals of spoken words are blocked into frames from which SEMG features were extracted and classified into a number of phonetic classes by a neural network. A sequence of phonetic class labels is thus produced which was subsequently smoothed by applying an error correction technique. The speech waveform of a word is then constructed by concatenating the pre-recorded speech segments corresponding to the phonetic class labels. Experimental results show that the neural network can classify the SEMG features with 86.3% accuracy, this can be further improved to 96.4% by smoothing the phonetic class labels. Experimental evaluations based on the synthesis of eight words show that on average 92.9% of the words can be synthesized correctly. It is also demonstrated that the proposed frame-based feature extraction and conversion methodology can be applied to SEMG-based speech synthesis. / Although speech is the most natural means for communication among humans, there are situations in which speech is impossible or inappropriate. Examples include people with vocal cord damage, underwater communications or in noisy environments. To address some of the limitations of speech communication, non-acoustic communication systems using surface electromyogram signals have been proposed. However, most of the proposed techniques focus on recognizing or classifying the SEMG signals into a limited set of words. This approach shares similarities with isolated word recognition systems in that periods of silence between words are mandatory and they have difficulties in recognizing untrained words and continuous speech. / Lam Yuet Ming. / "December 2006." / Adviser: Leong Heng Philip Wai. / Source: Dissertation Abstracts International, Volume: 68-08, Section: B, page: 5392. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 104-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Electromyography--Data processing Signal processing Speech processing systems

Search results