• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 178
  • 30
  • 21
  • 18
  • 11
  • 10
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 2
  • 1
  • Tagged with
  • 312
  • 312
  • 208
  • 107
  • 90
  • 70
  • 65
  • 54
  • 43
  • 36
  • 36
  • 35
  • 33
  • 30
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

An electronic device to reduce the dynamic range of speech

Hildebrant, Eric Michael January 1982 (has links)
Thesis (B.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING / Bibliography: leaves 90-92. / by Eric Michael Hildebrant. / B.S.
82

An investigation of vowel formant tracks for purposes of speaker identification.

Goldstein, Ursula Gisela January 1975 (has links)
Thesis. 1975. M.S.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / Bibliography: leaves 221-224. / M.S.
83

Unit selection and waveform concatenation strategies in Cantonese text-to-speech.

January 2005 (has links)
Oey Sai Lok. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references. / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.1 / Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2 / Chapter 1.1.1 --- Text processing --- p.2 / Chapter 1.1.2 --- Acoustic synthesis --- p.3 / Chapter 1.1.3 --- Prosody modification --- p.4 / Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5 / Chapter 1.3 --- Objectives of this thesis --- p.7 / Chapter 1.4 --- Outline of the thesis --- p.9 / References --- p.11 / Chapter 2. --- Cantonese Speech --- p.13 / Chapter 2.1 --- The Cantonese dialect --- p.13 / Chapter 2.2 --- Phonology of Cantonese --- p.14 / Chapter 2.2.1 --- Initials --- p.15 / Chapter 2.2.2 --- Finals --- p.16 / Chapter 2.2.3 --- Tones --- p.18 / Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19 / References --- p.24 / Chapter 3. --- Cantonese Text-to-Speech --- p.25 / Chapter 3.1 --- General overview --- p.25 / Chapter 3.1.1 --- Text processing --- p.25 / Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26 / Chapter 3.1.3 --- Prosodic control --- p.27 / Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28 / Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29 / Chapter 3.3.1 --- Definition of sub-syllable units --- p.29 / Chapter 3.3.2 --- Acoustic inventory --- p.31 / Chapter 3.3.3 --- Determination of the concatenation points --- p.33 / Chapter 3.4 --- Problems --- p.34 / References --- p.36 / Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37 / Chapter 4.1 --- Previous work in concatenation methods --- p.37 / Chapter 4.1.1 --- Determination of concatenation point --- p.38 / Chapter 4.1.2 --- Waveform concatenation --- p.38 / Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39 / Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40 / Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42 / Chapter 4.3 --- General procedures in concatenation strategies --- p.44 / Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45 / Chapter 4.3.2 --- Concatenation of voiced segments --- p.45 / Chapter 4.3.3 --- Measurement of spectral distance --- p.48 / Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50 / Chapter 4.4.1 --- Unvoiced segments --- p.50 / Chapter 4.4.2 --- Voiced segments --- p.53 / Chapter 4.5 --- Selected examples in concatenation strategies --- p.58 / Chapter 4.5.1 --- Concatenation at Initial segments --- p.58 / Chapter 4.5.1.1 --- Plosives --- p.58 / Chapter 4.5.1.2 --- Fricatives --- p.59 / Chapter 4.5.2 --- Concatenation at Final segments --- p.60 / Chapter 4.5.2.1 --- V group (long vowel) --- p.60 / Chapter 4.5.2.2 --- D group (diphthong) --- p.61 / References --- p.63 / Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65 / Chapter 5.1 --- Basic requirements in unit selection process --- p.65 / Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65 / Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66 / Chapter 5.1.1.2 --- Statistics on the availability --- p.67 / Chapter 5.1.2 --- Variations in acoustic parameters --- p.70 / Chapter 5.1.2.1 --- Pitch level --- p.71 / Chapter 5.1.2.2 --- Duration --- p.74 / Chapter 5.1.2.3 --- Intensity level --- p.75 / Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77 / Chapter 5.2.1 --- Multiple copies found --- p.79 / Chapter 5.2.2 --- Unique copy found --- p.79 / Chapter 5.2.3 --- No matched copy found --- p.80 / Chapter 5.2.4 --- Illustrative examples --- p.80 / Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81 / References --- p.88 / Chapter 6. --- Performance Evaluation --- p.89 / Chapter 6.1 --- General information --- p.90 / Chapter 6.1.1 --- Objective test --- p.90 / Chapter 6.1.2 --- Subjective test --- p.90 / Chapter 6.1.3 --- Test materials --- p.91 / Chapter 6.2 --- Details of the objective test --- p.92 / Chapter 6.2.1 --- Testing method --- p.92 / Chapter 6.2.2 --- Results --- p.93 / Chapter 6.2.3 --- Analysis --- p.96 / Chapter 6.3 --- Details of the subjective test --- p.98 / Chapter 6.3.1 --- Testing method --- p.98 / Chapter 6.3.2 --- Results --- p.99 / Chapter 6.3.3 --- Analysis --- p.101 / Chapter 6.4 --- Summary --- p.107 / References --- p.108 / Chapter 7. --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Suggested future works --- p.111 / References --- p.113 / Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114 / Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121 / Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124 / Appendix 4 Test word used in performance evaluation --- p.127 / Appendix 5 Test paragraph used in performance evaluation --- p.128 / Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131 / Appendix 7 Duration model used in Text-to-Speech system --- p.132
84

Speech synthesis from surface electromyogram signals. / CUHK electronic theses & dissertations collection

January 2006 (has links)
A method for synthesizing speech from surface electromyogram (SEMG) signals in a frame-by-frame basis is presented. The input SEMG signals of spoken words are blocked into frames from which SEMG features were extracted and classified into a number of phonetic classes by a neural network. A sequence of phonetic class labels is thus produced which was subsequently smoothed by applying an error correction technique. The speech waveform of a word is then constructed by concatenating the pre-recorded speech segments corresponding to the phonetic class labels. Experimental results show that the neural network can classify the SEMG features with 86.3% accuracy, this can be further improved to 96.4% by smoothing the phonetic class labels. Experimental evaluations based on the synthesis of eight words show that on average 92.9% of the words can be synthesized correctly. It is also demonstrated that the proposed frame-based feature extraction and conversion methodology can be applied to SEMG-based speech synthesis. / Although speech is the most natural means for communication among humans, there are situations in which speech is impossible or inappropriate. Examples include people with vocal cord damage, underwater communications or in noisy environments. To address some of the limitations of speech communication, non-acoustic communication systems using surface electromyogram signals have been proposed. However, most of the proposed techniques focus on recognizing or classifying the SEMG signals into a limited set of words. This approach shares similarities with isolated word recognition systems in that periods of silence between words are mandatory and they have difficulties in recognizing untrained words and continuous speech. / Lam Yuet Ming. / "December 2006." / Adviser: Leong Heng Philip Wai. / Source: Dissertation Abstracts International, Volume: 68-08, Section: B, page: 5392. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (p. 104-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
85

Multi-transputer based isolated word speech recognition system.

January 1996 (has links)
by Francis Cho-yiu Chik. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 129-135). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic speech recognition and its applications --- p.1 / Chapter 1.1.1 --- Artificial Neural Network (ANN) approach --- p.3 / Chapter 1.2 --- Motivation --- p.5 / Chapter 1.3 --- Background --- p.6 / Chapter 1.3.1 --- Speech recognition --- p.6 / Chapter 1.3.2 --- Parallel processing --- p.7 / Chapter 1.3.3 --- Parallel architectures --- p.10 / Chapter 1.3.4 --- Transputer --- p.12 / Chapter 1.4 --- Thesis outline --- p.13 / Chapter 2 --- Speech Signal Pre-processing --- p.14 / Chapter 2.1 --- Determine useful signal --- p.14 / Chapter 2.1.1 --- End point detection using energy --- p.15 / Chapter 2.1.2 --- End point detection enhancement using zero crossing rate --- p.18 / Chapter 2.2 --- Pre-emphasis filter --- p.19 / Chapter 2.3 --- Feature extraction --- p.20 / Chapter 2.3.1 --- Filter-bank spectrum analysis model --- p.22 / Chapter 2.3.2 --- Linear Predictive Coding (LPC) coefficients --- p.25 / Chapter 2.3.3 --- Cepstral coefficients --- p.27 / Chapter 2.3.4 --- Zero crossing rate and energy --- p.27 / Chapter 2.3.5 --- Pitch (fundamental frequency) detection --- p.28 / Chapter 2.4 --- Discussions --- p.30 / Chapter 3 --- Speech Recognition Methods --- p.32 / Chapter 3.1 --- Template matching using Dynamic Time Warping (DTW) --- p.32 / Chapter 3.2 --- Hidden Markov Model (HMM) --- p.37 / Chapter 3.2.1 --- Vector Quantization (VQ) --- p.38 / Chapter 3.2.2 --- Description of a discrete HMM --- p.41 / Chapter 3.2.3 --- Probability evaluation --- p.42 / Chapter 3.2.4 --- Estimation technique for model parameters --- p.46 / Chapter 3.2.5 --- State sequence for the observation sequence --- p.48 / Chapter 3.3 --- 2-dimensional Hidden Markov Model (2dHMM) --- p.49 / Chapter 3.3.1 --- Calculation for a 2dHMM --- p.50 / Chapter 3.4 --- Discussions --- p.56 / Chapter 4 --- Implementation --- p.59 / Chapter 4.1 --- Transputer based multiprocessor system --- p.59 / Chapter 4.1.1 --- Transputer Development System (TDS) --- p.60 / Chapter 4.1.2 --- System architecture --- p.61 / Chapter 4.1.3 --- Transtech TMB16 mother board --- p.62 / Chapter 4.1.4 --- Farming technique --- p.64 / Chapter 4.2 --- Farming technique on extracting spectral amplitude feature --- p.68 / Chapter 4.3 --- Feature extraction for LPC --- p.73 / Chapter 4.4 --- DTW based recognition --- p.77 / Chapter 4.4.1 --- Feature extraction --- p.77 / Chapter 4.4.2 --- Training and matching --- p.78 / Chapter 4.5 --- HMM based recognition --- p.80 / Chapter 4.5.1 --- Feature extraction --- p.80 / Chapter 4.5.2 --- Model training and matching --- p.81 / Chapter 4.6 --- 2dHMM based recognition --- p.83 / Chapter 4.6.1 --- Feature extraction --- p.83 / Chapter 4.6.2 --- Training --- p.83 / Chapter 4.6.3 --- Recognition --- p.87 / Chapter 4.7 --- Training convergence in HMM and 2dHMM --- p.88 / Chapter 4.8 --- Discussions --- p.91 / Chapter 5 --- Experimental Results --- p.92 / Chapter 5.1 --- "Comparison of DTW, HMM and 2dHMM" --- p.93 / Chapter 5.2 --- Comparison between HMM and 2dHMM --- p.98 / Chapter 5.2.1 --- Recognition test on 20 English words --- p.98 / Chapter 5.2.2 --- Recognition test on 10 Cantonese syllables --- p.102 / Chapter 5.3 --- Recognition test on 80 Cantonese syllables --- p.113 / Chapter 5.4 --- Speed matching --- p.118 / Chapter 5.5 --- Computational performance --- p.119 / Chapter 5.5.1 --- Training performance --- p.119 / Chapter 5.5.2 --- Recognition performance --- p.120 / Chapter 6 --- Discussions and Conclusions --- p.126 / Bibliography --- p.129 / Chapter A --- An ANN Model for Speech Recognition --- p.136 / Chapter B --- A Speech Signal Represented in Fequency Domain (Spectrogram) --- p.138 / Chapter C --- Dynamic Programming --- p.144 / Chapter D --- Markov Process --- p.145 / Chapter E --- Maximum Likelihood (ML) --- p.146 / Chapter F --- Multiple Training --- p.149 / Chapter F.1 --- HMM --- p.150 / Chapter F.2 --- 2dHMM --- p.150 / Chapter G --- IMS T800 Transputer --- p.152 / Chapter G.1 --- IMS T800 architecture --- p.152 / Chapter G.2 --- Instruction encoding --- p.153 / Chapter G.3 --- Floating point instructions --- p.155 / Chapter G.4 --- Optimizing use of the stack --- p.157 / Chapter G.5 --- Concurrent operation of FPU and CPU --- p.158
86

Phone-based speech synthesis using neural network with articulatory control.

January 1996 (has links)
by Lo Wai Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 151-160). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Applications of Speech Synthesis --- p.2 / Chapter 1.1.1 --- Human Machine Interface --- p.2 / Chapter 1.1.2 --- Speech Aids --- p.3 / Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4 / Chapter 1.1.4 --- Speech Dialogue System --- p.4 / Chapter 1.2 --- Current Status in Speech Synthesis --- p.6 / Chapter 1.2.1 --- Concatenation Based --- p.6 / Chapter 1.2.2 --- Parametric Based --- p.7 / Chapter 1.2.3 --- Articulatory Based --- p.7 / Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8 / Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9 / Chapter 1.3.1 --- Motivation --- p.9 / Chapter 1.3.2 --- Objectives --- p.9 / Chapter 1.4 --- Thesis outline --- p.11 / Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12 / Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12 / Chapter 2.2 --- Basic Phonology and Phonetics --- p.14 / Chapter 2.2.1 --- Phonology --- p.14 / Chapter 2.2.2 --- Phonetics --- p.15 / Chapter 2.2.3 --- Prosody --- p.16 / Chapter 2.3 --- Transcription Systems --- p.17 / Chapter 2.3.1 --- The Employed Transcription System --- p.18 / Chapter 2.4 --- Cantonese Phonology --- p.20 / Chapter 2.4.1 --- Some Properties of Cantonese --- p.20 / Chapter 2.4.2 --- Initial --- p.21 / Chapter 2.4.3 --- Final --- p.23 / Chapter 2.4.4 --- Lexical Tone --- p.25 / Chapter 2.4.5 --- Variations --- p.26 / Chapter 2.5 --- The Vowel Quadrilaterals --- p.29 / Chapter 3 --- Speech Synthesis Technology --- p.32 / Chapter 3.1 --- The Human Speech Production --- p.32 / Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34 / Chapter 3.2.1 --- Controllability --- p.34 / Chapter 3.2.2 --- Naturalness --- p.34 / Chapter 3.2.3 --- Complexity --- p.35 / Chapter 3.2.4 --- Information Storage --- p.35 / Chapter 3.3 --- Units for Synthesis --- p.37 / Chapter 3.4 --- Type of Synthesizer --- p.40 / Chapter 3.4.1 --- Copy Concatenation --- p.40 / Chapter 3.4.2 --- Vocoder --- p.41 / Chapter 3.4.3 --- Articulatory Synthesis --- p.44 / Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47 / Chapter 4.1 --- Neural Network Approximation --- p.48 / Chapter 4.1.1 --- The Approximation Problem --- p.48 / Chapter 4.1.2 --- Network Approach for Approximation --- p.49 / Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53 / Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53 / Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56 / Chapter 4.2.3 --- Radial Basis Function Network --- p.58 / Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59 / Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61 / Chapter 4.3.1 --- Implicit Template Storage --- p.61 / Chapter 4.3.2 --- Articulatory Control Parameters --- p.61 / Chapter 4.4 --- Summary --- p.65 / Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66 / Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66 / Chapter 5.1.1 --- Network Architectures --- p.68 / Chapter 5.1.2 --- Spectral Templates for Training --- p.74 / Chapter 5.1.3 --- System requirement --- p.76 / Chapter 5.2 --- Subjective Listening Test --- p.79 / Chapter 5.2.1 --- Sample Selection --- p.79 / Chapter 5.2.2 --- Test Procedure --- p.81 / Chapter 5.2.3 --- Result --- p.83 / Chapter 5.2.4 --- Analysis --- p.86 / Chapter 5.3 --- Summary --- p.88 / Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89 / Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90 / Chapter 6.1.1 --- Acoustic Effect --- p.90 / Chapter 6.1.2 --- Prosodic Effect --- p.91 / Chapter 6.2 --- Control in various Synthesis Techniques --- p.92 / Chapter 6.2.1 --- Copy Concatenation --- p.92 / Chapter 6.2.2 --- Formant Synthesis --- p.93 / Chapter 6.2.3 --- Articulatory synthesis --- p.93 / Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94 / Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95 / Chapter 6.4 --- Voice Correspondence : --- p.97 / Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98 / Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101 / Chapter 6.5 --- Summary --- p.108 / Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109 / Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110 / Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111 / Chapter 7.2.1 --- Speech Material Selection --- p.111 / Chapter 7.2.2 --- Experimental Procedure --- p.112 / Chapter 7.2.3 --- Result --- p.114 / Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117 / Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117 / Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119 / Chapter 7.3.3 --- General Observations --- p.119 / Chapter 7.3.4 --- Other Observations --- p.121 / Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124 / Chapter 7.5 --- Summary --- p.126 / Chapter 8 --- Conclusion and Further Work --- p.127 / Chapter 8.1 --- Conclusion --- p.127 / Chapter 8.2 --- Further Extension Work --- p.130 / Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130 / Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130 / Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131 / Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131 / Chapter A --- Cantonese Initials and Finals --- p.132 / Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132 / Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135 / Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135 / Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137 / Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138 / Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138 / Chapter D --- Phrase Lists --- p.140 / Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140 / Chapter D.1.1 --- 兩字詞 --- p.140 / Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144 / Chapter D.2.1 --- 片語 --- p.144
87

A frequency-based BSS technique for speech source separation.

January 2003 (has links)
Ngan Lai Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 95-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Blind Signal Separation (BSS) Methods --- p.4 / Chapter 1.2 --- Objectives of the Thesis --- p.6 / Chapter 1.3 --- Thesis Outline --- p.8 / Chapter 2 --- Blind Adaptive Frequency-Shift (BA-FRESH) Filter --- p.9 / Chapter 2.1 --- Cyclostationarity Properties --- p.10 / Chapter 2.2 --- Frequency-Shift (FRESH) Filter --- p.11 / Chapter 2.3 --- Blind Adaptive FRESH Filter --- p.12 / Chapter 2.4 --- Reduced-Rank BA-FRESH Filter --- p.14 / Chapter 2.4.1 --- CSP Method --- p.14 / Chapter 2.4.2 --- PCA Method --- p.14 / Chapter 2.4.3 --- Appropriate Choice of Rank --- p.14 / Chapter 2.5 --- Signal Extraction of Spectrally Overlapped Signals --- p.16 / Chapter 2.5.1 --- Simulation 1: A Fixed Rank --- p.17 / Chapter 2.5.2 --- Simulation 2: A Variable Rank --- p.18 / Chapter 2.6 --- Signal Separation of Speech Signals --- p.20 / Chapter 2.7 --- Chapter Summary --- p.22 / Chapter 3 --- Reverberant Environment --- p.23 / Chapter 3.1 --- Small Room Acoustics Model --- p.23 / Chapter 3.2 --- Effects of Reverberation to Speech Recognition --- p.27 / Chapter 3.2.1 --- Short Impulse Response --- p.27 / Chapter 3.2.2 --- Small Room Impulse Response Modelled by Image Method --- p.32 / Chapter 3.3 --- Chapter Summary --- p.34 / Chapter 4 --- Information Theoretic Approach for Signal Separation --- p.35 / Chapter 4.1 --- Independent Component Analysis (ICA) --- p.35 / Chapter 4.1.1 --- Kullback-Leibler (K-L) Divergence --- p.37 / Chapter 4.2 --- Information Maximization (Infomax) --- p.39 / Chapter 4.2.1 --- Stochastic Gradient Descent and Stability Problem --- p.41 / Chapter 4.2.2 --- Infomax and ICA --- p.41 / Chapter 4.2.3 --- Infomax and Maximum Likelihood --- p.42 / Chapter 4.3 --- Signal Separation by Infomax --- p.43 / Chapter 4.4 --- Chapter Summary --- p.45 / Chapter 5 --- Blind Signal Separation (BSS) in Frequency Domain --- p.47 / Chapter 5.1 --- Convolutive Mixing System --- p.48 / Chapter 5.2 --- Infomax in Frequency Domain --- p.52 / Chapter 5.3 --- Adaptation Algorithms --- p.54 / Chapter 5.3.1 --- Standard Gradient Method --- p.54 / Chapter 5.3.2 --- Natural Gradient Method --- p.55 / Chapter 5.3.3 --- Convergence Performance --- p.56 / Chapter 5.4 --- Subband Adaptation --- p.57 / Chapter 5.5 --- Energy Weighting --- p.59 / Chapter 5.6 --- The Permutation Problem --- p.61 / Chapter 5.7 --- Performance Evaluation --- p.63 / Chapter 5.7.1 --- De-reverberation Performance Factor --- p.63 / Chapter 5.7.2 --- De-Noise Performance Factor --- p.63 / Chapter 5.7.3 --- Spectral Signal-to-noise Ratio (SNR) --- p.65 / Chapter 5.8 --- Chapter Summary --- p.65 / Chapter 6 --- Simulation Results and Performance Analysis --- p.67 / Chapter 6.1 --- Small Room Acoustics Modelled by Image Method --- p.67 / Chapter 6.2 --- Signal Sources --- p.68 / Chapter 6.2.1 --- Cantonese Speech --- p.69 / Chapter 6.2.2 --- Noise --- p.69 / Chapter 6.3 --- De-Noise and De-Reverberation Performance Analysis --- p.69 / Chapter 6.3.1 --- Speech and White Noise --- p.73 / Chapter 6.3.2 --- Speech and Voice Babble Noise --- p.76 / Chapter 6.3.3 --- Two Female Speeches --- p.79 / Chapter 6.4 --- Recognition Accuracy Performance Analysis --- p.83 / Chapter 6.4.1 --- Speech and White Noise --- p.83 / Chapter 6.4.2 --- Speech and Voice Babble Noise --- p.84 / Chapter 6.4.3 --- Two Cantonese Speeches --- p.85 / Chapter 6.5 --- Chapter Summary --- p.87 / Chapter 7 --- Conclusions and Suggestions for Future Research --- p.88 / Chapter 7.1 --- Conclusions --- p.88 / Chapter 7.2 --- Suggestions for Future Research --- p.91 / Appendices --- p.92 / A The Proof of Stability Conditions for Stochastic Gradient De- scent Algorithm (Ref. (4.15)) --- p.92 / Bibliography --- p.95
88

Text-independent bilingual speaker verification system.

January 2003 (has links)
Ma Bin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 96-102). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Biometrics --- p.2 / Chapter 1.2 --- Speaker Verification --- p.3 / Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4 / Chapter 1.4 --- Text Dependency --- p.4 / Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5 / Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6 / Chapter 1.5 --- Language Dependency --- p.6 / Chapter 1.6 --- Normalization Techniques --- p.7 / Chapter 1.7 --- Objectives of the Thesis --- p.8 / Chapter 1.8 --- Thesis Organization --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Background Information --- p.11 / Chapter 2.1.1 --- Speech Signal Acquisition --- p.11 / Chapter 2.1.2 --- Speech Processing --- p.11 / Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13 / Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14 / Chapter 2.1.5 --- Feature Parameters --- p.15 / Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16 / Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18 / Chapter 2.1.5.3 --- Energy Measures --- p.20 / Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21 / Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22 / Chapter 2.2 --- Common Techniques --- p.24 / Chapter 2.2.1 --- Template Model Matching Methods --- p.25 / Chapter 2.2.2 --- Statistical Model Methods --- p.26 / Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27 / Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30 / Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31 / Chapter 2.2.2.4 --- The Advantages of GMM --- p.32 / Chapter 2.2.3 --- Likelihood Scoring --- p.32 / Chapter 2.2.4 --- General Approach to Decision Making --- p.35 / Chapter 2.2.5 --- Cohort Normalization --- p.35 / Chapter 2.2.5.1 --- Probability Score Normalization --- p.36 / Chapter 2.2.5.2 --- Cohort Selection --- p.37 / Chapter 2.3 --- Chapter Summary --- p.38 / Chapter 3 --- Experimental Corpora --- p.39 / Chapter 3.1 --- The YOHO Corpus --- p.39 / Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39 / Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40 / Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41 / Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42 / Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42 / Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44 / Chapter 3.3 --- Chapter Summary --- p.46 / Chapter 4 --- Text-Dependent Speaker Verification --- p.47 / Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48 / Chapter 4.2 --- Cohort Normalization Setup --- p.50 / Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53 / Chapter 4.3.1 --- Subword HMM Models --- p.53 / Chapter 4.3.2 --- Experimental Results --- p.55 / Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55 / Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58 / Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61 / Chapter 4.4.1 --- Experimental Setup --- p.61 / Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62 / Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64 / Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65 / Chapter 4.5 --- Comparison with Previous Systems --- p.67 / Chapter 4.6 --- Chapter Summary --- p.70 / Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71 / Chapter 5.1 --- Front-End Processing of the CUBS --- p.72 / Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73 / Chapter 5.3 --- Cohort Normalization --- p.74 / Chapter 5.4 --- Experimental Results and Analysis --- p.75 / Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78 / Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79 / Chapter 5.4.3 --- Language Dependency --- p.80 / Chapter 5.4.4 --- Language-Independency --- p.83 / Chapter 5.5 --- Chapter Summary --- p.88 / Chapter 6 --- Conclusions and Future Work --- p.90 / Chapter 6.1 --- Summary --- p.90 / Chapter 6.1.1 --- Feature Comparison --- p.91 / Chapter 6.1.2 --- HMM Modeling --- p.91 / Chapter 6.1.3 --- GMM Modeling --- p.91 / Chapter 6.1.4 --- Cohort Normalization --- p.92 / Chapter 6.1.5 --- Language Dependency --- p.92 / Chapter 6.2 --- Future Work --- p.93 / Chapter 6.2.1 --- Feature Parameters --- p.93 / Chapter 6.2.2 --- Model Quality --- p.93 / Chapter 6.2.2.1 --- Variance Flooring --- p.93 / Chapter 6.2.2.2 --- Silence Detection --- p.94 / Chapter 6.2.3 --- Conversational Speaker Verification --- p.95 / Bibliography --- p.102
89

Audio compression and speech enhancement using temporal masking models

Gunawan, Teddy Surya, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2007 (has links)
Of the few existing models of temporal masking applicable to problems such as compression and enhancement, none are based on empirical data from the psychoacoustic literature, presumably because the multidimensional nature of the data makes the derivation of tractable functional models difficult. This thesis presents two new functional models of the temporal masking effect of the human auditory system, and their exploitation in audio compression and speech enhancement applications. Traditional audio compression algorithms do not completely utilise the temporal masking properties of the human auditory system, relying solely on simultaneous masking models. A perceptual wavelet packet-based audio coder has been devised that incorporates the first developed temporal masking model and combined with simultaneous masking models in a novel manner. An evaluation of the coder using both objective (PEAQ, ITU-R BS.1387) and extensive subjective tests (ITU-R BS.1116) revealed a bitrate reduction of more than 17% compared with existing simultaneous masking-based audio coders, while preserving transparent quality. In addition, the oversampled wavelet packet transform (ODWT) has been newly applied to obtain alias-free coefficients for more accurate masking threshold calculation. Finally, a low-complexity scalable audio coding algorithm using the ODWT-based thresholds and temporal masking has been investigated. Currently, there is a strong need for innovative speech enhancement algorithms exploiting the auditory masking effects of human auditory system that perform well at very low signal-to-noise ratio. Existing competitive noise suppression algorithms and those that incorporate simultaneous masking were examined and evaluated for their suitability as baseline algorithms. Objective measures using PESQ (ITU-T P.862) and subjective measures (ITU-T P.835) demonstrate that the proposed enhancement scheme, based on a second new masking model, outperformed the seven baseline speech enhancement methods by at least 6- 20% depending on the SNR. Hence, the proposed speech enhancement scheme exploiting temporal masking effects has good potential across many types and intensities of environmental noise. Keywords: human auditory system; temporal masking; simultaneous masking; audio compression; speech enhancement; subjective test; objective test.
90

Robust speech features for speech recognition in hostile environments

Toh, Aik January 1900 (has links)
Speech recognition systems have improved in robustness in recent years with respect to both speaker and acoustical variability. Nevertheless, it is still a challenge to deploy speech recognition systems in real-world applications that are exposed to diverse and significant level of noise. Robustness and recognition accuracy are the essential criteria in determining the extent of a speech recognition system deployed in real-world applications. This work involves development of techniques and extensions to extract robust features from speech and achieve substantial performance in speech recognition. Robustness and recognition accuracy are the top concern in this research. In this work, the robustness issue is approached using the front-end processing, in particular robust feature extraction. The author proposes an unified framework for robust feature and presents a comprehensive evaluation on robustness in speech features. The framework addresses three distinct approaches: robust feature extraction, temporal information inclusion and normalization strategies. The author discusses the issue of robust feature selection primarily in the spectral and cepstral context. Several enhancement and extensions are explored for the purpose of robustness. This includes a computationally efficient approach proposed for moment normalization. In addition, a simple back-end approach is incorporated to improve recognition performance in reverberant environments. Speech features in this work are evaluated in three distinct environments that occur in real-world scenarios. The thesis also discusses the effect of noise on speech features and their parameters. The author has established that statistical properties play an important role in mismatches. The significance of the research is strengthened by the evaluation of robust approaches in more than one scenario and the comparison with the performance of the state-of-the-art features. The contributions and limitations of each robust feature in all three different environments are highlighted. The novelty of the work lies in the diverse hostile environments which speech features are evaluated for robustness. The author has obtained recognition accuracy of more than 98.5% for channel distortion. Recognition accuracy greater than 90.0% has also been maintained for reverberation time 0.4s and additive babble noise at SNR 10dB. The thesis delivers a comprehensive research on robust speech features for speech recognition in hostile environments supported by significant experimental results. Several observations, recommendations and relevant issues associated with robust speech features are presented.

Page generated in 0.0459 seconds