21 |
Hidden Markov models with improved duration and observation structuresConner, Paul N. January 1993 (has links)
No description available.
|
22 |
Exploring time variabilty in representations of speechCabral, Euvaldo F. January 1993 (has links)
No description available.
|
23 |
Automatic determination of sub-word units for automatic speech recognitionCouper Kenney, Fiona January 2008 (has links)
Current automatic speech recognition (ASR) research is focused on recognition of continuous, spontaneous speech. Spontaneous speech contains a lot of variability in the way words are pronounced, and canonical pronunciations of each word are not true to the variation that is seen in real data. Two of the components of an ASR system are acoustic models and pronunciation models. The variation within spontaneous speech must be accounted for by these components. Phones, or context-dependent phones are typically used as the base subword unit, and one acoustic model is trained for each sub-word unit. Pronunciation modelling largely takes place in a dictionary, which relates words to sequences of phones. Acoustic modelling and pronunciation modelling overlap, and the two are not clearly separable in modelling pronunciation variation. Techniques that find pronunciation variants in the data and then reflect these in the dictionary have not provided expected gains in recognition. An alternative approach to modelling pronunciations in terms of phones is to derive units automatically: using data-driven methods to determine an inventory of sub-word units, their acoustic models, and their relationship to words. This thesis presents a method for the automatic derivation of a sub-word unit inventory, whose main components are 1. automatic and simultaneous generation of a sub-word unit inventory and acoustic model set, using an ergodic hidden Markov model whose complexity is controlled using the Bayesian Information Criterion 2. automatic generation of probabilistic dictionaries using joint multigrams The prerequisites of this approach are fewer than in previous work on unit derivation; notably, the timings of word boundaries are not required here. The approach is language independent since it is entirely data-driven and no linguistic information is required. The dictionary generation method outperforms a supervised method using phonetic data. The automatically derived units and dictionary perform reasonably on a small spontaneous speech task, although not yet outperforming phones.
|
24 |
Towards formal structural representation of spoken language : an evolving transformation system (ETS) approachAlexander, Gutkin January 2006 (has links)
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new approaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the advent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of representation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because decision surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and linguistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisation. The approach pursued in this thesis is based on a consistent program, known as the Evolving Transformation System (ETS), motivated by the development and clarification of the concept of structural representation in pattern recognition and artificial intelligence from both theoretical and applied points of view. This thesis consists of two parts. In the first part of this thesis, a similarity-based approach to structural representation of speech is presented. First, a linguistically well-motivated structural representation of phones based on distinctive phonological features recovered from speech is proposed. The representation consists of string templates representing phones together with a similarity measure. The set of phonological templates together with a similarity measure defines a symbolic metric space. Representation and ETS-inspired categorisation in the symbolic metric spaces corresponding to the phonological structural representation are then investigated by constructing appropriate symbolic space classifiers and evaluating them on a standard corpus of read speech. In addition, similarity-based isometric transition from phonological symbolic metric spaces to the corresponding non-Euclidean vector spaces is investigated. Second part of this thesis deals with the formal approach to structural representation of spoken language. Unlike the approach adopted in the first part of this thesis, the representation developed in the second part is based on the mathematical language of the ETS formalism. This formalism has been specifically developed for structural modelling of dynamic processes. In particular, it allows the representation of both objects and classes in a uniform event-based hierarchical framework. In this thesis, the latter property of the formalism allows the adoption of a more physiologically-concreteapproach to structural representation. The proposed representation is based on gestural structures and encapsulates speech processes at the articulatory level. Algorithms for deriving the articulatory structures from the data are presented and evaluated.
|
25 |
Voice classification using a unique key signature20 November 2014 (has links)
M.Com. (Informatics) / Please refer to full text to view abstract
|
26 |
Video classification using automata theory20 November 2014 (has links)
M.Com. / Please refer to full text to view abstract
|
27 |
Predicting the performance of a speech recognition task.January 2002 (has links)
Yau Pui Yuk. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 147-152). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Speech Recognition --- p.2 / Chapter 1.2.1 --- How Speech Recognition Works --- p.3 / Chapter 1.2.2 --- Types of Speech Recognition Tasks --- p.4 / Chapter 1.2.3 --- Variabilities in Speech 一 a Challenge for Speech Recog- nition --- p.6 / Chapter 1.3 --- Performance Prediction of Speech Recognition Task --- p.7 / Chapter 1.4 --- Thesis Goals --- p.9 / Chapter 1.5 --- Thesis Organization --- p.10 / Chapter 2 --- Background --- p.11 / Chapter 2.1 --- The Acoustic-phonetic Approach --- p.12 / Chapter 2.1.1 --- Prediction based on the Degree of Mismatch --- p.12 / Chapter 2.1.2 --- Prediction based on Acoustic Similarity --- p.13 / Chapter 2.1.3 --- Prediction based on Between-Word Distance --- p.14 / Chapter 2.2 --- The Lexical Approach --- p.16 / Chapter 2.2.1 --- Perplexity --- p.16 / Chapter 2.2.2 --- SMR-perplexity --- p.17 / Chapter 2.3 --- The Combined Acoustic-phonetic and Lexical Approach --- p.18 / Chapter 2.3.1 --- Speech Decoder Entropy (SDE) --- p.19 / Chapter 2.3.2 --- Ideal Speech Decoding Difficulty (ISDD) --- p.20 / Chapter 2.4 --- Chapter Summary --- p.23 / Chapter 3 --- Components for Predicting the Performance of Speech Recog- nition Task --- p.24 / Chapter 3.1 --- Components of Speech Recognizer --- p.25 / Chapter 3.2 --- Word Similarity Measure --- p.27 / Chapter 3.2.1 --- Universal Phoneme Symbol (UPS) --- p.30 / Chapter 3.2.2 --- Definition of Phonetic Distance --- p.31 / Chapter 3.2.3 --- Definition of Word Pair Phonetic Distance --- p.45 / Chapter 3.2.4 --- Definition of Word Similarity Measure --- p.47 / Chapter 3.3 --- Word Occurrence Measure --- p.62 / Chapter 3.4 --- Chapter Summary --- p.64 / Chapter 4 --- Formulation of Recognition Error Predictive Index (REPI) --- p.65 / Chapter 4.1 --- Formulation of Recognition Error Predictive Index (REPI) --- p.66 / Chapter 4.2 --- Characteristics of Recognition Error Predictive Index (REPI) --- p.74 / Chapter 4.2.1 --- Weakness of Ideal Speech Decoding Difficulty (ISDD) --- p.75 / Chapter 4.2.2 --- Advantages of Recognition Error Predictive Index (REPI) --- p.79 / Chapter 4.3 --- Chapter Summary --- p.82 / Chapter 5 --- Experimental Design and Setup --- p.83 / Chapter 5.1 --- Objectives --- p.83 / Chapter 5.2 --- Experiments Preparation --- p.84 / Chapter 5.2.1 --- Speech Corpus and Speech Recognizers --- p.85 / Chapter 5.2.2 --- Speech Recognition Tasks --- p.93 / Chapter 5.2.3 --- Evaluation Criterion --- p.98 / Chapter 5.3 --- Experiment Categories and their Setup --- p.99 / Chapter 5.3.1 --- Experiment Category 1 一 Investigating and comparing the overall prediction performance of the two predictive indices --- p.102 / Chapter 5.3.2 --- Experiment Category 2 一 Comparing the applicability of the word similarity measures of the two predictive indices on predicting the recognition performance --- p.104 / Chapter 5.3.3 --- Experiment Category 3 - Comparing the applicability of the formulation method of the two predictive indices on predicting the recognition performance --- p.107 / Chapter 5.3.4 --- Experiment Category 4 一 Comparing the performance of different phonetic distance definitions --- p.109 / Chapter 5.4 --- Chapter Summary --- p.111 / Chapter 6 --- Experimental Results and Analysis --- p.112 / Chapter 6.1 --- Experimental Results and Analysis --- p.113 / Chapter 6.1.1 --- Experiment Category 1 - Investigating and comparing the overall prediction performance of the two predictive indices --- p.113 / Chapter 6.1.2 --- Experiment Category 2- Comparing the applicability of the word similarity measures of the two predictive indices on predicting the recognition performance --- p.117 / Chapter 6.1.3 --- Experiment Category 3 一 Comparing the applicability of the formulation method of the two predictive indices on predicting the recognition performance --- p.124 / Chapter 6.1.4 --- Experiment Category 4 - Comparing the performance of different phonetic distance definitions --- p.131 / Chapter 6.2 --- Experimental Summary --- p.137 / Chapter 6.3 --- Chapter Summary --- p.141 / Chapter 7 --- Conclusions --- p.142 / Chapter 7.1 --- Contributions --- p.144 / Chapter 7.2 --- Future Directions --- p.145 / Bibliography --- p.147 / Chapter A --- Table of Universal Phoneme Symbol --- p.153 / Chapter B --- Vocabulary Lists --- p.157 / Chapter C --- Experimental Results of Two-words Speech Recognition Tasks --- p.171 / Chapter D --- Experimental Results of Three-words Speech Recognition Tasks --- p.180 / Chapter E --- Significance Testing --- p.190 / Chapter E.1 --- Procedures of Significance Testing --- p.190 / Chapter E.2 --- Results of the Significance Testing --- p.191 / Chapter E.2.1 --- Experiment Category 1 --- p.191 / Chapter E.2.2 --- Experiment Category 2 --- p.192 / Chapter E.2.3 --- Experiment Category 3 --- p.194 / Chapter E.2.4 --- Experiment Category 4 --- p.196 / Chapter F --- Linear Regression Models --- p.197
|
28 |
Speaker verification over the telephone =: 電話中講話者身份確認技術. / 電話中講話者身份確認技術 / Speaker verification over the telephone =: Dian hua zhong jiang hua zhe shen fen que ren ji shu. / Dian hua zhong jiang hua zhe shen fen que ren ji shuJanuary 1999 (has links)
by Cheng Yoik. / Thesis submitted in: October 1998. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 91-95). / Text in English; abstract also in Chinese. / by Cheng Yoik. / Chapter 1 --- Introduction --- p.13 / Chapter 1.1 --- What is Speaker Verification --- p.13 / Chapter 1.2 --- Review on Recent Speaker Verification Research --- p.15 / Chapter 1.2.1 --- Hidden Markov Modeling --- p.15 / Chapter 1.2.2 --- Cohort Normalization Scoring --- p.16 / Chapter 1.3 --- Objective of Thesis --- p.17 / Chapter 1.3.1 --- Text-prompted Speaker Verification System --- p.18 / Chapter 1.3.2 --- Fundamental Frequency (FO) Information --- p.18 / Chapter 1.3.3 --- Cohort Normalization on Cantonese --- p.19 / Chapter 1.4 --- Chapter Outline --- p.19 / Chapter 2 --- System Description --- p.21 / Chapter 2.1 --- System Overview --- p.21 / Chapter 2.2 --- Speech Signal Representations --- p.23 / Chapter 2.2.1 --- LPC Cesptral Coefficients --- p.24 / Chapter 2.2.2 --- Prosodic Features --- p.27 / Chapter 2.3 --- HMM Modeling Technique --- p.30 / Chapter 2.4 --- Speaker Classification --- p.34 / Chapter 2.4.1 --- Likelihood Scoring --- p.34 / Chapter 2.4.2 --- Verification Process --- p.36 / Chapter 2.4.2.1 --- General Approach --- p.36 / Chapter 2.4.2.2 --- Normalization Approach --- p.37 / Chapter 2.4.3 --- Cohort Sets --- p.39 / Chapter 2.5 --- Summary --- p.41 / Chapter 3 --- Experimental Setup --- p.42 / Chapter 3.1 --- Introduction --- p.42 / Chapter 3.2 --- Databases --- p.42 / Chapter 3.2.1 --- Cantonese Database --- p.43 / Chapter 3.2.2 --- YOHO Corpus --- p.45 / Chapter 3.3 --- Feature Analysis --- p.46 / Chapter 3.4 --- Speaker Models --- p.47 / Chapter 3.5 --- Experiments --- p.48 / Chapter 3.5.1 --- Evaluation --- p.48 / Chapter 3.5.2 --- FO Experiments --- p.50 / Chapter 3.5.2.1 --- FO Value --- p.51 / Chapter 3.5.2.2 --- Log FO Value --- p.51 / Chapter 3.5.2.3 --- Normalized FO --- p.52 / Chapter 3.5.2.4 --- Normalized Log FO --- p.53 / Chapter 3.5.3 --- Cohort Normalization Experiments --- p.53 / Chapter 3.5.3.1 --- Preliminary Study --- p.55 / Chapter 3.5.3.2 --- Cohort Normalization on Cantonese --- p.57 / Chapter 3.5.3.3 --- Cohort Normalization with Pitch Information on Cantonese --- p.58 / Chapter 4 --- Results and Analysis --- p.59 / Chapter 4.1 --- Introduction --- p.59 / Chapter 4.2 --- FO Experiments --- p.60 / Chapter 4.2.1 --- Results of Various Representation of FO Value on Cantonese --- p.60 / Chapter 4.2.2 --- Performance Comparison between Cantonese and English --- p.63 / Chapter 4.3 --- Cohort Normalization Experiments --- p.67 / Chapter 4.3.1 --- Performance Comparison on Our Results to Other Researches --- p.67 / Chapter 4.3.2 --- Results of Applying Cohort Normalization on Cantonese --- p.71 / Chapter 4.3.3 --- Results of Applying Cohort Normalization with Pitch Information on Cantonese --- p.74 / Chapter 4.4 --- Summary --- p.79 / Chapter 5 --- Conclusions and Future Work --- p.81 / Chapter 5.1 --- Conclusions --- p.81 / Chapter 5.2 --- Future Work --- p.82 / Chapter 5.2.1 --- Refinements --- p.82 / Chapter 5.2.2 --- Formant --- p.84 / Chapter 5.2.3 --- Independent Cohort Models --- p.85 / Chapter 6 --- Application --- p.86 / Chapter 6.1 --- Overview --- p.86 / Chapter 6.2 --- Telephony Interface --- p.87 / Chapter 6.3 --- Verification --- p.88 / Chapter 6.4 --- Discussion --- p.89
|
29 |
Use of vocal source features in speaker segmentation.January 2006 (has links)
Chan Wai Nang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (leaves 77-82). / Abstracts in English and Chinese. / Chapter Chapter1 --- Introduction --- p.1 / Chapter 1.1 --- Speaker recognition --- p.1 / Chapter 1.2 --- State of the art of speaker recognition techniques --- p.2 / Chapter 1.3 --- Motivations --- p.5 / Chapter 1.4 --- Thesis outline --- p.6 / Chapter Chapter2 --- Acoustic Features --- p.8 / Chapter 2.1 --- Speech production --- p.8 / Chapter 2.1.1 --- Physiology of speech production --- p.8 / Chapter 2.1.2 --- Source-filter model --- p.11 / Chapter 2.2 --- Vocal tract and vocal source related acoustic features --- p.14 / Chapter 2.3 --- Linear predictive analysis of speech --- p.15 / Chapter 2.4 --- Features for speaker recognition --- p.16 / Chapter 2.4.1 --- Vocal tract related features --- p.17 / Chapter 2.4.2 --- Vocal source related features --- p.19 / Chapter 2.5 --- Wavelet octave coefficients of residues (WOCOR) --- p.20 / Chapter Chapter3 --- Statistical approaches to speaker recognition --- p.24 / Chapter 3.1 --- Statistical modeling --- p.24 / Chapter 3.1.1 --- Classification and modeling --- p.24 / Chapter 3.1.2 --- Parametric vs non-parametric --- p.25 / Chapter 3.1.3 --- Gaussian mixture model (GMM) --- p.25 / Chapter 3.1.4 --- Model estimation --- p.27 / Chapter 3.2 --- Classification --- p.28 / Chapter 3.2.1 --- Multi-class classification for speaker identification --- p.28 / Chapter 3.2.2 --- Two-speaker recognition --- p.29 / Chapter 3.2.3 --- Model selection by statistical model --- p.30 / Chapter 3.2.4 --- Performance evaluation metric --- p.31 / Chapter Chapter4 --- Content dependency study of WOCOR and MFCC --- p.32 / Chapter 4.1 --- Database: CU2C --- p.32 / Chapter 4.2 --- Methods and procedures --- p.33 / Chapter 4.3 --- Experimental results --- p.35 / Chapter 4.4 --- Discussion --- p.36 / Chapter 4.5 --- Detailed analysis --- p.39 / Summary --- p.41 / Chapter Chapter5 --- Speaker Segmentation --- p.43 / Chapter 5.1 --- Feature extraction --- p.43 / Chapter 5.2 --- Statistical methods for segmentation and clustering --- p.44 / Chapter 5.2.1 --- Segmentation by spectral difference --- p.44 / Chapter 5.2.2 --- Segmentation by Bayesian information criterion (BIC) --- p.47 / Chapter 5.2.3 --- Segment clustering by BIC --- p.49 / Chapter 5.3 --- Baseline system --- p.50 / Chapter 5.3.1 --- Algorithm --- p.50 / Chapter 5.3.2 --- Speech database --- p.52 / Chapter 5.3.3 --- Performance metric --- p.53 / Chapter 5.3.4 --- Results --- p.58 / Summary --- p.60 / Chapter Chapter6 --- Application of vocal source features in speaker segmentation --- p.61 / Chapter 6.1 --- Discrimination power of WOCOR against MFCC --- p.61 / Chapter 6.1.1 --- Experimental set-up --- p.62 / Chapter 6.1.2 --- Results --- p.63 / Chapter 6.2 --- Speaker segmentation using vocal source features --- p.67 / Chapter 6.2.1 --- The construction of new proposed system --- p.67 / Summary --- p.72 / Chapter Chapter7 --- Conclusions --- p.74 / Reference --- p.77
|
30 |
Discriminative models for speech recognitionRagni, Anton January 2014 (has links)
No description available.
|
Page generated in 0.106 seconds