81 |
Speech recognition using hybrid system of neural networks and knowledge sources.Darjazini, Hisham, University of Western Sydney, College of Health and Science, School of Engineering January 2006 (has links)
In this thesis, a novel hybrid Speech Recognition (SR) system called RUST (Recognition Using Syntactical Tree) is developed. RUST combines Artificial Neural Networks (ANN) with a Statistical Knowledge Source (SKS) for a small topic focused database. The hypothesis of this research work was that the inclusion of syntactic knowledge represented in the form of probability of occurrence of phones in words and sentences improves the performance of an ANN-based SR system. The lexicon of the first version of RUST (RUST-I) was developed with 1357 words of which 549 were unique. These words were extracted from three topics (finance, physics and general reading material), and could be expanded or reduced (specialised). The results of experiments carried out on RUST showed that by including basic statistical phonemic/syntactic knowledge with an ANN phone recognisor, the phone recognition rate was increased to 87% and word recognition rate to 78%. The first implementation of RUST was not optimal. Therefore, a second version of RUST (RUST-II) was implemented with an incremental learning algorithm and it has been shown to improve the phone recognition rate to 94%. The introduction of incremental learning to ANN-based speech recognition can be considered as the most innovative feature of this research. In conclusion this work has proved the hypothesis that inclusion of a phonemic syntactic knowledge of probabilistic nature and topic related statistical data using an adaptive phone recognisor based on neural networks has the potential to improve the performance of a speech recognition system. / Doctor of Philosophy (PhD)
|
82 |
A Framework for Speech Recognition using Logistic RegressionBirkenes, Øystein January 2007 (has links)
<p>Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.</p><p>In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. </p><p>Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.</p><p>A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.</p>
|
83 |
A VHDL description of speech recognition front-endXiao, Xin 17 May 2001 (has links)
This thesis investigates an implementation of speech recognition front-end.
It is an application specific integrated circuit (ASIC) solution. A Mel Cepstrum
algorithm is implemented for the feature extraction. We present a new mixed split-radix
and radix-2 Fast Fourier Transform (FFT) algorithm, which can effectively
minimize the number of complex multiplications in the speech recognition front-end.
A prime length discrete cosine transform (DCT) is done effectively through
the use of two shorter length correlations. The algorithm results in a circular
correlation structure that is suitable for a constant coefficient multiplication and
shift-register realization. The multiplicative normalization algorithm is used for
square root function. Radix-2 algorithm is used in the first 5 stages and radix-4
algorithm is used in the other stages to speed up the convergence. A similar
normalization algorithm is present for natural logarithm. / Graduation date: 2002
|
84 |
Leveraging multimodal redundancy for dynamic learning, with SHACER, a speech and handwriting recognizer /Kaiser, Edward C. January 2007 (has links)
Thesis (Ph.D.) OGI School of Science & Engineering at OHSU, April 2007. / Includes bibliographical references (leaves 189-206).
|
85 |
A Framework for Speech Recognition using Logistic RegressionBirkenes, Øystein January 2007 (has links)
Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.
|
86 |
Production Knowledge in the Recognition of Dysarthric SpeechRudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual.
This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners.
This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
|
87 |
Production Knowledge in the Recognition of Dysarthric SpeechRudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual.
This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners.
This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
|
88 |
A Keyword Based Interactive Speech Recognition System for Embedded ApplicationsCastro Ceron, Ivan Francisco, Garcia Badillo, Andrea Graciela January 2011 (has links)
Speech recognition has been an important area of research during the past decades. The usage of automatic speech recognition systems is rapidly increasing among different areas, such as mobile telephony, automotive, healthcare, robotics and more. However, despite the existence of many speech recognition systems, most of them use platform specific and non-publicly available software. Nevertheless, it is possible to develop speech recognition systems using already existing open source technology. The aim of this master's thesis is to develop an interactive and speaker independent speech recognition system. The system shall be able to identify predetermined keywords from incoming live speech and in response, play audio files with related information. Moreover, the system shall be able to provide a response even if no keyword was identified. For this project, the system was implemented using PocketSphinx, a speech recognition library, part of the open source Sphinx technology by the Carnegie Mellon University. During the implementation of this project, the automation of different steps of the process, was a key factor for a successful completion. This automation consisted on the development of different tools for the creation of the language model and the dictionary, two important components of the system. Similarly, the audio files to be played after identifying a keyword, as well as the evaluation of the system's performance, were fully automated. The tests run show encouraging results and demonstrate that the system is a feasible solution that could be implemented and tested in a real embedded application. Despite the good results, possible improvements can be implemented, such as the creation of a different phonetic dictionary to support different languages.
|
89 |
A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition /Yin, Yan. January 2008 (has links)
Thesis (M.Sc.)--York University, 2008. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 101-109). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR45978
|
90 |
A study on acoustic modeling and adaptation in HMM-based speech recognitionMa, Bin, January 2000 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2001. / Includes bibliographical references (leaves 103-112).
|
Page generated in 0.1354 seconds