Global ETD Search

161	Multi-transputer based isolated word speech recognition system. January 1996 (has links) by Francis Cho-yiu Chik. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 129-135). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic speech recognition and its applications --- p.1 / Chapter 1.1.1 --- Artificial Neural Network (ANN) approach --- p.3 / Chapter 1.2 --- Motivation --- p.5 / Chapter 1.3 --- Background --- p.6 / Chapter 1.3.1 --- Speech recognition --- p.6 / Chapter 1.3.2 --- Parallel processing --- p.7 / Chapter 1.3.3 --- Parallel architectures --- p.10 / Chapter 1.3.4 --- Transputer --- p.12 / Chapter 1.4 --- Thesis outline --- p.13 / Chapter 2 --- Speech Signal Pre-processing --- p.14 / Chapter 2.1 --- Determine useful signal --- p.14 / Chapter 2.1.1 --- End point detection using energy --- p.15 / Chapter 2.1.2 --- End point detection enhancement using zero crossing rate --- p.18 / Chapter 2.2 --- Pre-emphasis filter --- p.19 / Chapter 2.3 --- Feature extraction --- p.20 / Chapter 2.3.1 --- Filter-bank spectrum analysis model --- p.22 / Chapter 2.3.2 --- Linear Predictive Coding (LPC) coefficients --- p.25 / Chapter 2.3.3 --- Cepstral coefficients --- p.27 / Chapter 2.3.4 --- Zero crossing rate and energy --- p.27 / Chapter 2.3.5 --- Pitch (fundamental frequency) detection --- p.28 / Chapter 2.4 --- Discussions --- p.30 / Chapter 3 --- Speech Recognition Methods --- p.32 / Chapter 3.1 --- Template matching using Dynamic Time Warping (DTW) --- p.32 / Chapter 3.2 --- Hidden Markov Model (HMM) --- p.37 / Chapter 3.2.1 --- Vector Quantization (VQ) --- p.38 / Chapter 3.2.2 --- Description of a discrete HMM --- p.41 / Chapter 3.2.3 --- Probability evaluation --- p.42 / Chapter 3.2.4 --- Estimation technique for model parameters --- p.46 / Chapter 3.2.5 --- State sequence for the observation sequence --- p.48 / Chapter 3.3 --- 2-dimensional Hidden Markov Model (2dHMM) --- p.49 / Chapter 3.3.1 --- Calculation for a 2dHMM --- p.50 / Chapter 3.4 --- Discussions --- p.56 / Chapter 4 --- Implementation --- p.59 / Chapter 4.1 --- Transputer based multiprocessor system --- p.59 / Chapter 4.1.1 --- Transputer Development System (TDS) --- p.60 / Chapter 4.1.2 --- System architecture --- p.61 / Chapter 4.1.3 --- Transtech TMB16 mother board --- p.62 / Chapter 4.1.4 --- Farming technique --- p.64 / Chapter 4.2 --- Farming technique on extracting spectral amplitude feature --- p.68 / Chapter 4.3 --- Feature extraction for LPC --- p.73 / Chapter 4.4 --- DTW based recognition --- p.77 / Chapter 4.4.1 --- Feature extraction --- p.77 / Chapter 4.4.2 --- Training and matching --- p.78 / Chapter 4.5 --- HMM based recognition --- p.80 / Chapter 4.5.1 --- Feature extraction --- p.80 / Chapter 4.5.2 --- Model training and matching --- p.81 / Chapter 4.6 --- 2dHMM based recognition --- p.83 / Chapter 4.6.1 --- Feature extraction --- p.83 / Chapter 4.6.2 --- Training --- p.83 / Chapter 4.6.3 --- Recognition --- p.87 / Chapter 4.7 --- Training convergence in HMM and 2dHMM --- p.88 / Chapter 4.8 --- Discussions --- p.91 / Chapter 5 --- Experimental Results --- p.92 / Chapter 5.1 --- "Comparison of DTW, HMM and 2dHMM" --- p.93 / Chapter 5.2 --- Comparison between HMM and 2dHMM --- p.98 / Chapter 5.2.1 --- Recognition test on 20 English words --- p.98 / Chapter 5.2.2 --- Recognition test on 10 Cantonese syllables --- p.102 / Chapter 5.3 --- Recognition test on 80 Cantonese syllables --- p.113 / Chapter 5.4 --- Speed matching --- p.118 / Chapter 5.5 --- Computational performance --- p.119 / Chapter 5.5.1 --- Training performance --- p.119 / Chapter 5.5.2 --- Recognition performance --- p.120 / Chapter 6 --- Discussions and Conclusions --- p.126 / Bibliography --- p.129 / Chapter A --- An ANN Model for Speech Recognition --- p.136 / Chapter B --- A Speech Signal Represented in Fequency Domain (Spectrogram) --- p.138 / Chapter C --- Dynamic Programming --- p.144 / Chapter D --- Markov Process --- p.145 / Chapter E --- Maximum Likelihood (ML) --- p.146 / Chapter F --- Multiple Training --- p.149 / Chapter F.1 --- HMM --- p.150 / Chapter F.2 --- 2dHMM --- p.150 / Chapter G --- IMS T800 Transputer --- p.152 / Chapter G.1 --- IMS T800 architecture --- p.152 / Chapter G.2 --- Instruction encoding --- p.153 / Chapter G.3 --- Floating point instructions --- p.155 / Chapter G.4 --- Optimizing use of the stack --- p.157 / Chapter G.5 --- Concurrent operation of FPU and CPU --- p.158 Automatic speech recognition Speech processing systems Markov processes
162	Phone-based speech synthesis using neural network with articulatory control. January 1996 (has links) by Lo Wai Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 151-160). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Applications of Speech Synthesis --- p.2 / Chapter 1.1.1 --- Human Machine Interface --- p.2 / Chapter 1.1.2 --- Speech Aids --- p.3 / Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4 / Chapter 1.1.4 --- Speech Dialogue System --- p.4 / Chapter 1.2 --- Current Status in Speech Synthesis --- p.6 / Chapter 1.2.1 --- Concatenation Based --- p.6 / Chapter 1.2.2 --- Parametric Based --- p.7 / Chapter 1.2.3 --- Articulatory Based --- p.7 / Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8 / Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9 / Chapter 1.3.1 --- Motivation --- p.9 / Chapter 1.3.2 --- Objectives --- p.9 / Chapter 1.4 --- Thesis outline --- p.11 / Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12 / Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12 / Chapter 2.2 --- Basic Phonology and Phonetics --- p.14 / Chapter 2.2.1 --- Phonology --- p.14 / Chapter 2.2.2 --- Phonetics --- p.15 / Chapter 2.2.3 --- Prosody --- p.16 / Chapter 2.3 --- Transcription Systems --- p.17 / Chapter 2.3.1 --- The Employed Transcription System --- p.18 / Chapter 2.4 --- Cantonese Phonology --- p.20 / Chapter 2.4.1 --- Some Properties of Cantonese --- p.20 / Chapter 2.4.2 --- Initial --- p.21 / Chapter 2.4.3 --- Final --- p.23 / Chapter 2.4.4 --- Lexical Tone --- p.25 / Chapter 2.4.5 --- Variations --- p.26 / Chapter 2.5 --- The Vowel Quadrilaterals --- p.29 / Chapter 3 --- Speech Synthesis Technology --- p.32 / Chapter 3.1 --- The Human Speech Production --- p.32 / Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34 / Chapter 3.2.1 --- Controllability --- p.34 / Chapter 3.2.2 --- Naturalness --- p.34 / Chapter 3.2.3 --- Complexity --- p.35 / Chapter 3.2.4 --- Information Storage --- p.35 / Chapter 3.3 --- Units for Synthesis --- p.37 / Chapter 3.4 --- Type of Synthesizer --- p.40 / Chapter 3.4.1 --- Copy Concatenation --- p.40 / Chapter 3.4.2 --- Vocoder --- p.41 / Chapter 3.4.3 --- Articulatory Synthesis --- p.44 / Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47 / Chapter 4.1 --- Neural Network Approximation --- p.48 / Chapter 4.1.1 --- The Approximation Problem --- p.48 / Chapter 4.1.2 --- Network Approach for Approximation --- p.49 / Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53 / Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53 / Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56 / Chapter 4.2.3 --- Radial Basis Function Network --- p.58 / Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59 / Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61 / Chapter 4.3.1 --- Implicit Template Storage --- p.61 / Chapter 4.3.2 --- Articulatory Control Parameters --- p.61 / Chapter 4.4 --- Summary --- p.65 / Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66 / Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66 / Chapter 5.1.1 --- Network Architectures --- p.68 / Chapter 5.1.2 --- Spectral Templates for Training --- p.74 / Chapter 5.1.3 --- System requirement --- p.76 / Chapter 5.2 --- Subjective Listening Test --- p.79 / Chapter 5.2.1 --- Sample Selection --- p.79 / Chapter 5.2.2 --- Test Procedure --- p.81 / Chapter 5.2.3 --- Result --- p.83 / Chapter 5.2.4 --- Analysis --- p.86 / Chapter 5.3 --- Summary --- p.88 / Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89 / Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90 / Chapter 6.1.1 --- Acoustic Effect --- p.90 / Chapter 6.1.2 --- Prosodic Effect --- p.91 / Chapter 6.2 --- Control in various Synthesis Techniques --- p.92 / Chapter 6.2.1 --- Copy Concatenation --- p.92 / Chapter 6.2.2 --- Formant Synthesis --- p.93 / Chapter 6.2.3 --- Articulatory synthesis --- p.93 / Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94 / Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95 / Chapter 6.4 --- Voice Correspondence : --- p.97 / Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98 / Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101 / Chapter 6.5 --- Summary --- p.108 / Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109 / Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110 / Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111 / Chapter 7.2.1 --- Speech Material Selection --- p.111 / Chapter 7.2.2 --- Experimental Procedure --- p.112 / Chapter 7.2.3 --- Result --- p.114 / Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117 / Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117 / Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119 / Chapter 7.3.3 --- General Observations --- p.119 / Chapter 7.3.4 --- Other Observations --- p.121 / Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124 / Chapter 7.5 --- Summary --- p.126 / Chapter 8 --- Conclusion and Further Work --- p.127 / Chapter 8.1 --- Conclusion --- p.127 / Chapter 8.2 --- Further Extension Work --- p.130 / Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130 / Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130 / Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131 / Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131 / Chapter A --- Cantonese Initials and Finals --- p.132 / Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132 / Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135 / Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135 / Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137 / Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138 / Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138 / Chapter D --- Phrase Lists --- p.140 / Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140 / Chapter D.1.1 --- 兩字詞 --- p.140 / Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144 / Chapter D.2.1 --- 片語 --- p.144 Speech synthesis Speech processing systems Neural networks (Computer science)
163	A frequency-based BSS technique for speech source separation. January 2003 (has links) Ngan Lai Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 95-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Blind Signal Separation (BSS) Methods --- p.4 / Chapter 1.2 --- Objectives of the Thesis --- p.6 / Chapter 1.3 --- Thesis Outline --- p.8 / Chapter 2 --- Blind Adaptive Frequency-Shift (BA-FRESH) Filter --- p.9 / Chapter 2.1 --- Cyclostationarity Properties --- p.10 / Chapter 2.2 --- Frequency-Shift (FRESH) Filter --- p.11 / Chapter 2.3 --- Blind Adaptive FRESH Filter --- p.12 / Chapter 2.4 --- Reduced-Rank BA-FRESH Filter --- p.14 / Chapter 2.4.1 --- CSP Method --- p.14 / Chapter 2.4.2 --- PCA Method --- p.14 / Chapter 2.4.3 --- Appropriate Choice of Rank --- p.14 / Chapter 2.5 --- Signal Extraction of Spectrally Overlapped Signals --- p.16 / Chapter 2.5.1 --- Simulation 1: A Fixed Rank --- p.17 / Chapter 2.5.2 --- Simulation 2: A Variable Rank --- p.18 / Chapter 2.6 --- Signal Separation of Speech Signals --- p.20 / Chapter 2.7 --- Chapter Summary --- p.22 / Chapter 3 --- Reverberant Environment --- p.23 / Chapter 3.1 --- Small Room Acoustics Model --- p.23 / Chapter 3.2 --- Effects of Reverberation to Speech Recognition --- p.27 / Chapter 3.2.1 --- Short Impulse Response --- p.27 / Chapter 3.2.2 --- Small Room Impulse Response Modelled by Image Method --- p.32 / Chapter 3.3 --- Chapter Summary --- p.34 / Chapter 4 --- Information Theoretic Approach for Signal Separation --- p.35 / Chapter 4.1 --- Independent Component Analysis (ICA) --- p.35 / Chapter 4.1.1 --- Kullback-Leibler (K-L) Divergence --- p.37 / Chapter 4.2 --- Information Maximization (Infomax) --- p.39 / Chapter 4.2.1 --- Stochastic Gradient Descent and Stability Problem --- p.41 / Chapter 4.2.2 --- Infomax and ICA --- p.41 / Chapter 4.2.3 --- Infomax and Maximum Likelihood --- p.42 / Chapter 4.3 --- Signal Separation by Infomax --- p.43 / Chapter 4.4 --- Chapter Summary --- p.45 / Chapter 5 --- Blind Signal Separation (BSS) in Frequency Domain --- p.47 / Chapter 5.1 --- Convolutive Mixing System --- p.48 / Chapter 5.2 --- Infomax in Frequency Domain --- p.52 / Chapter 5.3 --- Adaptation Algorithms --- p.54 / Chapter 5.3.1 --- Standard Gradient Method --- p.54 / Chapter 5.3.2 --- Natural Gradient Method --- p.55 / Chapter 5.3.3 --- Convergence Performance --- p.56 / Chapter 5.4 --- Subband Adaptation --- p.57 / Chapter 5.5 --- Energy Weighting --- p.59 / Chapter 5.6 --- The Permutation Problem --- p.61 / Chapter 5.7 --- Performance Evaluation --- p.63 / Chapter 5.7.1 --- De-reverberation Performance Factor --- p.63 / Chapter 5.7.2 --- De-Noise Performance Factor --- p.63 / Chapter 5.7.3 --- Spectral Signal-to-noise Ratio (SNR) --- p.65 / Chapter 5.8 --- Chapter Summary --- p.65 / Chapter 6 --- Simulation Results and Performance Analysis --- p.67 / Chapter 6.1 --- Small Room Acoustics Modelled by Image Method --- p.67 / Chapter 6.2 --- Signal Sources --- p.68 / Chapter 6.2.1 --- Cantonese Speech --- p.69 / Chapter 6.2.2 --- Noise --- p.69 / Chapter 6.3 --- De-Noise and De-Reverberation Performance Analysis --- p.69 / Chapter 6.3.1 --- Speech and White Noise --- p.73 / Chapter 6.3.2 --- Speech and Voice Babble Noise --- p.76 / Chapter 6.3.3 --- Two Female Speeches --- p.79 / Chapter 6.4 --- Recognition Accuracy Performance Analysis --- p.83 / Chapter 6.4.1 --- Speech and White Noise --- p.83 / Chapter 6.4.2 --- Speech and Voice Babble Noise --- p.84 / Chapter 6.4.3 --- Two Cantonese Speeches --- p.85 / Chapter 6.5 --- Chapter Summary --- p.87 / Chapter 7 --- Conclusions and Suggestions for Future Research --- p.88 / Chapter 7.1 --- Conclusions --- p.88 / Chapter 7.2 --- Suggestions for Future Research --- p.91 / Appendices --- p.92 / A The Proof of Stability Conditions for Stochastic Gradient De- scent Algorithm (Ref. (4.15)) --- p.92 / Bibliography --- p.95 Adaptive signal processing Automatic speech recognition Speech processing systems
164	Text-independent bilingual speaker verification system. January 2003 (has links) Ma Bin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 96-102). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Biometrics --- p.2 / Chapter 1.2 --- Speaker Verification --- p.3 / Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4 / Chapter 1.4 --- Text Dependency --- p.4 / Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5 / Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6 / Chapter 1.5 --- Language Dependency --- p.6 / Chapter 1.6 --- Normalization Techniques --- p.7 / Chapter 1.7 --- Objectives of the Thesis --- p.8 / Chapter 1.8 --- Thesis Organization --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Background Information --- p.11 / Chapter 2.1.1 --- Speech Signal Acquisition --- p.11 / Chapter 2.1.2 --- Speech Processing --- p.11 / Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13 / Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14 / Chapter 2.1.5 --- Feature Parameters --- p.15 / Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16 / Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18 / Chapter 2.1.5.3 --- Energy Measures --- p.20 / Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21 / Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22 / Chapter 2.2 --- Common Techniques --- p.24 / Chapter 2.2.1 --- Template Model Matching Methods --- p.25 / Chapter 2.2.2 --- Statistical Model Methods --- p.26 / Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27 / Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30 / Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31 / Chapter 2.2.2.4 --- The Advantages of GMM --- p.32 / Chapter 2.2.3 --- Likelihood Scoring --- p.32 / Chapter 2.2.4 --- General Approach to Decision Making --- p.35 / Chapter 2.2.5 --- Cohort Normalization --- p.35 / Chapter 2.2.5.1 --- Probability Score Normalization --- p.36 / Chapter 2.2.5.2 --- Cohort Selection --- p.37 / Chapter 2.3 --- Chapter Summary --- p.38 / Chapter 3 --- Experimental Corpora --- p.39 / Chapter 3.1 --- The YOHO Corpus --- p.39 / Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39 / Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40 / Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41 / Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42 / Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42 / Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44 / Chapter 3.3 --- Chapter Summary --- p.46 / Chapter 4 --- Text-Dependent Speaker Verification --- p.47 / Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48 / Chapter 4.2 --- Cohort Normalization Setup --- p.50 / Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53 / Chapter 4.3.1 --- Subword HMM Models --- p.53 / Chapter 4.3.2 --- Experimental Results --- p.55 / Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55 / Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58 / Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61 / Chapter 4.4.1 --- Experimental Setup --- p.61 / Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62 / Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64 / Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65 / Chapter 4.5 --- Comparison with Previous Systems --- p.67 / Chapter 4.6 --- Chapter Summary --- p.70 / Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71 / Chapter 5.1 --- Front-End Processing of the CUBS --- p.72 / Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73 / Chapter 5.3 --- Cohort Normalization --- p.74 / Chapter 5.4 --- Experimental Results and Analysis --- p.75 / Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78 / Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79 / Chapter 5.4.3 --- Language Dependency --- p.80 / Chapter 5.4.4 --- Language-Independency --- p.83 / Chapter 5.5 --- Chapter Summary --- p.88 / Chapter 6 --- Conclusions and Future Work --- p.90 / Chapter 6.1 --- Summary --- p.90 / Chapter 6.1.1 --- Feature Comparison --- p.91 / Chapter 6.1.2 --- HMM Modeling --- p.91 / Chapter 6.1.3 --- GMM Modeling --- p.91 / Chapter 6.1.4 --- Cohort Normalization --- p.92 / Chapter 6.1.5 --- Language Dependency --- p.92 / Chapter 6.2 --- Future Work --- p.93 / Chapter 6.2.1 --- Feature Parameters --- p.93 / Chapter 6.2.2 --- Model Quality --- p.93 / Chapter 6.2.2.1 --- Variance Flooring --- p.93 / Chapter 6.2.2.2 --- Silence Detection --- p.94 / Chapter 6.2.3 --- Conversational Speaker Verification --- p.95 / Bibliography --- p.102 Automatic speech recognition Speech processing systems Markov processes
165	Design and evaluation of tone-enhanced strategy for cochlear implants in noisy environment. January 2011 (has links) Yu, Shing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 87-93). / Abstracts in English and Chinese; includes Chinese. / Abstract --- p.i / Acknowledgement --- p.vi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Hearing impairment --- p.1 / Chapter 1.2 --- Limitations of existing CI --- p.2 / Chapter 1.3 --- Objectives --- p.3 / Chapter 1.4 --- Thesis Outline --- p.4 / Chapter 2 --- Background --- p.6 / Chapter 2.1 --- Signal Processing in CI --- p.6 / Chapter 2.1.1 --- Continuous Interleaved Sampler (CIS) --- p.7 / Chapter 2.1.2 --- Advanced Combination Encoder (ACE) --- p.12 / Chapter 2.2 --- Tone perception by cochlear implantees --- p.15 / Chapter 2.2.1 --- Pitch and Tone --- p.15 / Chapter 2.2.2 --- Mechanisms of pitch perception by cochlear im- plantees --- p.20 / Chapter 3 --- Tone-enhanced ACE Strategy for CI --- p.23 / Chapter 3.1 --- Basic principles --- p.23 / Chapter 3.2 --- Acoustical simulation with noise excited vocoder --- p.26 / Chapter 3.3 --- Implementation in a real CI system --- p.29 / Chapter 3.3.1 --- Technical details --- p.30 / Chapter 3.3.2 --- Visual comparison --- p.31 / Chapter 4 --- Robust Generation of F0 Trajectory --- p.33 / Chapter 4.1 --- Requirement on the F0 contour --- p.33 / Chapter 4.2 --- Extraction of F0 contour --- p.34 / Chapter 4.3 --- Post-processing of F0 contour --- p.36 / Chapter 4.3.1 --- Removal of octave-jump --- p.36 / Chapter 4.3.2 --- Interpolation --- p.36 / Chapter 4.3.3 --- Prediction --- p.36 / Chapter 4.3.4 --- Smoothing --- p.38 / Chapter 4.4 --- Performance evaluation --- p.38 / Chapter 5 --- Design of Listening Tests --- p.41 / Chapter 5.1 --- Speech Materials --- p.41 / Chapter 5.2 --- Testing modes --- p.43 / Chapter 5.2.1 --- Sound field mode --- p.45 / Chapter 5.2.2 --- Direct stimulation mode --- p.46 / Chapter 5.3 --- Test Interface --- p.47 / Chapter 6 --- Sound-field Tests --- p.49 / Chapter 6.1 --- Materials and Methods --- p.50 / Chapter 6.1.1 --- Subjects --- p.50 / Chapter 6.1.2 --- Signal processing and test stimuli --- p.52 / Chapter 6.1.3 --- Procedures --- p.52 / Chapter 6.2 --- Results --- p.54 / Chapter 6.3 --- Discussion --- p.57 / Chapter 7 --- Evaluation of Tone-enhanced Strategy --- p.59 / Chapter 7.1 --- Materials and Methods --- p.60 / Chapter 7.1.1 --- Subjects --- p.60 / Chapter 7.1.2 --- Signal processing and test stimuli --- p.60 / Chapter 7.1.3 --- Procedures --- p.62 / Chapter 7.2 --- Results --- p.63 / Chapter 7.3 --- Discussion --- p.66 / Chapter 8 --- Use of Automatically Generated F0 Contour --- p.72 / Chapter 8.1 --- Materials and Methods --- p.73 / Chapter 8.2 --- Results --- p.74 / Chapter 8.3 --- Discussion --- p.76 / Chapter 9 --- Conclusions --- p.80 / Chapter A --- LSHK Cantonese Romanization Scheme --- p.85 / Bibliography --- p.87 Cochlear implants Electrocochleography
166	Audio compression and speech enhancement using temporal masking models Gunawan, Teddy Surya, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2007 (has links) Of the few existing models of temporal masking applicable to problems such as compression and enhancement, none are based on empirical data from the psychoacoustic literature, presumably because the multidimensional nature of the data makes the derivation of tractable functional models difficult. This thesis presents two new functional models of the temporal masking effect of the human auditory system, and their exploitation in audio compression and speech enhancement applications. Traditional audio compression algorithms do not completely utilise the temporal masking properties of the human auditory system, relying solely on simultaneous masking models. A perceptual wavelet packet-based audio coder has been devised that incorporates the first developed temporal masking model and combined with simultaneous masking models in a novel manner. An evaluation of the coder using both objective (PEAQ, ITU-R BS.1387) and extensive subjective tests (ITU-R BS.1116) revealed a bitrate reduction of more than 17% compared with existing simultaneous masking-based audio coders, while preserving transparent quality. In addition, the oversampled wavelet packet transform (ODWT) has been newly applied to obtain alias-free coefficients for more accurate masking threshold calculation. Finally, a low-complexity scalable audio coding algorithm using the ODWT-based thresholds and temporal masking has been investigated. Currently, there is a strong need for innovative speech enhancement algorithms exploiting the auditory masking effects of human auditory system that perform well at very low signal-to-noise ratio. Existing competitive noise suppression algorithms and those that incorporate simultaneous masking were examined and evaluated for their suitability as baseline algorithms. Objective measures using PESQ (ITU-T P.862) and subjective measures (ITU-T P.835) demonstrate that the proposed enhancement scheme, based on a second new masking model, outperformed the seven baseline speech enhancement methods by at least 6- 20% depending on the SNR. Hence, the proposed speech enhancement scheme exploiting temporal masking effects has good potential across many types and intensities of environmental noise. Keywords: human auditory system; temporal masking; simultaneous masking; audio compression; speech enhancement; subjective test; objective test. Coding theor. Signal processing. Speech processing systems. Vocoder. Wavelets (Mathematics)
167	Robust speech features for speech recognition in hostile environments Toh, Aik January 1900 (has links) Speech recognition systems have improved in robustness in recent years with respect to both speaker and acoustical variability. Nevertheless, it is still a challenge to deploy speech recognition systems in real-world applications that are exposed to diverse and significant level of noise. Robustness and recognition accuracy are the essential criteria in determining the extent of a speech recognition system deployed in real-world applications. This work involves development of techniques and extensions to extract robust features from speech and achieve substantial performance in speech recognition. Robustness and recognition accuracy are the top concern in this research. In this work, the robustness issue is approached using the front-end processing, in particular robust feature extraction. The author proposes an unified framework for robust feature and presents a comprehensive evaluation on robustness in speech features. The framework addresses three distinct approaches: robust feature extraction, temporal information inclusion and normalization strategies. The author discusses the issue of robust feature selection primarily in the spectral and cepstral context. Several enhancement and extensions are explored for the purpose of robustness. This includes a computationally efficient approach proposed for moment normalization. In addition, a simple back-end approach is incorporated to improve recognition performance in reverberant environments. Speech features in this work are evaluated in three distinct environments that occur in real-world scenarios. The thesis also discusses the effect of noise on speech features and their parameters. The author has established that statistical properties play an important role in mismatches. The significance of the research is strengthened by the evaluation of robust approaches in more than one scenario and the comparison with the performance of the state-of-the-art features. The contributions and limitations of each robust feature in all three different environments are highlighted. The novelty of the work lies in the diverse hostile environments which speech features are evaluated for robustness. The author has obtained recognition accuracy of more than 98.5% for channel distortion. Recognition accuracy greater than 90.0% has also been maintained for reverberation time 0.4s and additive babble noise at SNR 10dB. The thesis delivers a comprehensive research on robust speech features for speech recognition in hostile environments supported by significant experimental results. Several observations, recommendations and relevant issues associated with robust speech features are presented. Robust speech features Speech recognition Noisy environments Speech processing
168	Dynamic analog speech synthesizer January 1960 (has links) George Rosen. / "February 10, 1960." "Submitted to the Department of Electrical Engineering, M.I.T., January 25, 1960, in partial fulfillment of the requirements for the Doctor of Science." / Bibliography: p. 86-88. / Army Signal Corps Contract DA36-039-sc-78108 Dept. of the Army Task 3-99-20-001 and Project 3-99-00-000. Air Force Contract AF19(604)-6102. TK7855.M41 R43 no.353 Speech processing systems
169	Emotion Recognition Using Glottal and Prosodic Features Iliev, Alexander Iliev 21 December 2009 (has links) Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the recognition of emotion conveyed in speech. There were three main objectives of this study. One was to examine the role played by the glottal source signal in the expression of emotional speech. The second was to investigate whether it can provide improved robustness in real-world situations and in noisy environments. This was achieved through testing in clear and various noisy conditions. Finally, the performance of glottal features was compared to diverse existing and newly introduced emotional feature domains. A novel glottal symmetry feature is proposed and automatically extracted from speech. The effectiveness of several inverse filtering methods in extracting the glottal signal from speech has been examined. Other than the glottal symmetry, two additional feature classes were tested for emotion recognition domains. They are the: Tonal and Break Indices (ToBI) of American English intonation, and Mel Frequency Cepstral Coefficients (MFCC) of the glottal signal. Three corpora were specifically designed for the task. The first two investigated the four emotions: Happy, Angry, Sad, and Neutral, and the third added Fear and Surprise in a six emotions recognition task. This work shows that the glottal signal carries valuable emotional information and using it for emotion recognition has many advantages over other conventional methods. For clean speech, in a four emotion recognition task using classical prosodic features achieved 89.67% recognition, ToBI combined with classical features, reached 84.75% recognition, while using glottal symmetry alone achieved 98.74%. For a six emotions task these three methods achieved 79.62%, 90.39% and 85.37% recognition rates, respectively. Using the glottal signal also provided greater classifier robustness under noisy conditions and distortion caused by low pass filtering. Specifically, for additive white Gaussian noise at SNR = 10 dB in the six emotion task the classical features and the classical with ToBI both failed to provide successful results; speech MFCC's achieved a recognition rate of 41.43% and glottal symmetry reached 59.29%. This work has shown that the glottal signal, and the glottal symmetry in particular, provides high class separation for both the four and six emotion cases. It is confidently surpassing the performance of all other features included in this investigation in noisy speech conditions and in most clean signal conditions. Speech Processing Speech Emotion Recognition Emotion Recognition Speech Recognition
170	Der verflixte Akkusativ : Altersunterschiede und Altersinvarianz beim Verstehen von Sätzen mit unterschiedlich komplexer syntaktischer Struktur / Tricky accusative : age-related differences in comprehension of sentences with different syntactical structure Junker, Martina January 2004 (has links) In dieser Arbeit wird in mehreren Experimenten untersucht, wie gut junge und alte Erwachsene Sätze mit unterschiedlich komplexer syntaktischer Struktur verstehen können. Zentrales Thema dabei sind die Schwierigkeiten, die ältere Erwachsene mit der Objekt-vor-Subjekt-Wortstellung haben. Untersucht wird, inwiefern diese beobachteten Altersunterschiede durch eine reduzierte verbale Arbeitsgedächtniskapazität der älteren Erwachsenen erklärt werden können. Dabei stellt sich die Frage, ob die Defizite ein generelles verbales Arbeitsgedächtnis betreffen oder ob es ein eigenes Verarbeitungs-system für syntaktische Informationen gibt, dessen Kapazität mit dem Alter abnimmt. Es wurde versucht, die postulierte reduzierte Arbeitsgedächtniskapazität der älteren Erwachsenen an jungen Erwachsenen zu simulieren, indem deren Arbeitsgedächtniska-pazität durch eine Zusatzaufgabe künstlich eingeschränkt wurde. Weiterhin wurden die Altersunterschiede bei syntaktisch komplexen zentraleingebetteten Relativsätzen mit denen bei syntaktisch einfacheren koordinierten Hauptsätzen verglichen. Um die Studienteilnehmer mit den seltenen objektinitialen Strukturen zu konfrontieren und ihre Erfahrung mit solchen Sätzen zu verändern, wurden schließlich sowohl junge als auch alte Erwachsene mit Sätzen mit Objekt-vor-Subjekt-Wortstellung trainiert. / In this paper several experiments about age differences in comprehension of sentences with different syntactical structure are reported. The main focus is on the difficulties old adults experience when a sentence starts with an object. Can the age differences be explained by differences in working memory capacity? Have old adults less working memory capacity, or does there exist a separate working memory for syntactic information which declines with age? In an age simulation, young adults working memory capacity was reduced by an additional digit load. Age differences in comprehension of syntactical complex sentences were compared with age differences in sentences with less complex syntactical structure. To change their experience with the rare object initial word order participants were trained with object initial sentences. Gerontologie Psycholinguistik Arbeitsgedächtnis aging, speech processing, working memory Language, Linguistics

Search results