Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
131 |
Speech recognition using hybrid system of neural networks and knowledge sources.Darjazini, Hisham, University of Western Sydney, College of Health and Science, School of Engineering January 2006 (has links)
In this thesis, a novel hybrid Speech Recognition (SR) system called RUST (Recognition Using Syntactical Tree) is developed. RUST combines Artificial Neural Networks (ANN) with a Statistical Knowledge Source (SKS) for a small topic focused database. The hypothesis of this research work was that the inclusion of syntactic knowledge represented in the form of probability of occurrence of phones in words and sentences improves the performance of an ANN-based SR system. The lexicon of the first version of RUST (RUST-I) was developed with 1357 words of which 549 were unique. These words were extracted from three topics (finance, physics and general reading material), and could be expanded or reduced (specialised). The results of experiments carried out on RUST showed that by including basic statistical phonemic/syntactic knowledge with an ANN phone recognisor, the phone recognition rate was increased to 87% and word recognition rate to 78%. The first implementation of RUST was not optimal. Therefore, a second version of RUST (RUST-II) was implemented with an incremental learning algorithm and it has been shown to improve the phone recognition rate to 94%. The introduction of incremental learning to ANN-based speech recognition can be considered as the most innovative feature of this research. In conclusion this work has proved the hypothesis that inclusion of a phonemic syntactic knowledge of probabilistic nature and topic related statistical data using an adaptive phone recognisor based on neural networks has the potential to improve the performance of a speech recognition system. / Doctor of Philosophy (PhD)
|
132 |
A Framework for Speech Recognition using Logistic RegressionBirkenes, Øystein January 2007 (has links)
<p>Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.</p><p>In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. </p><p>Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.</p><p>A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.</p>
|
133 |
A VHDL description of speech recognition front-endXiao, Xin 17 May 2001 (has links)
This thesis investigates an implementation of speech recognition front-end.
It is an application specific integrated circuit (ASIC) solution. A Mel Cepstrum
algorithm is implemented for the feature extraction. We present a new mixed split-radix
and radix-2 Fast Fourier Transform (FFT) algorithm, which can effectively
minimize the number of complex multiplications in the speech recognition front-end.
A prime length discrete cosine transform (DCT) is done effectively through
the use of two shorter length correlations. The algorithm results in a circular
correlation structure that is suitable for a constant coefficient multiplication and
shift-register realization. The multiplicative normalization algorithm is used for
square root function. Radix-2 algorithm is used in the first 5 stages and radix-4
algorithm is used in the other stages to speed up the convergence. A similar
normalization algorithm is present for natural logarithm. / Graduation date: 2002
|
134 |
Leveraging multimodal redundancy for dynamic learning, with SHACER, a speech and handwriting recognizer /Kaiser, Edward C. January 2007 (has links)
Thesis (Ph.D.) OGI School of Science & Engineering at OHSU, April 2007. / Includes bibliographical references (leaves 189-206).
|
135 |
Measurement, analysis, and detection of nasalization in speechNiu, Xiaochuan 03 1900 (has links) (PDF)
Ph.D. / Computer Science and Electrical Engineering / Nasalization refers to the process of speech production in which significant amounts of airfow and sound energy are transmitted through the nasal tract. In phonetics, nasalization is necessary for certain phonemes to be produced in normal speech; and it can also be a normal consequence of coarticulation. In disordered speech, however, inappropriate nasalization can be one of the causes that reduces the intelligibility of speech. Instrumental measurement and analysis techniques are needed for better understanding the relationship between the physiological status and the aerodynamic and acoustic effects of nasalization during speech. The main aim of the research work presented in this dissertation is to investigate the aerodynamic and acoustic effects of nasalization, and to develop objective approaches to measure, analyze, and detect the nasalized segments in speech. Based on an extensive survey of existing literature on the measurements of velopharyngeal function, the acoustic production models of speech, the analysis methods and results of normal nasalization, and the analysis methods of resonance disorders, it is understood that the final acoustic representation of nasalization is a complex outcome that is affected by the degree of velopharyngeal opening, the variation of vocal tract configurations, the mixture of multiple acoustic channels and speaker differences. It is proposed to incorporate more available information besides single channel acoustic signals during the analysis of nasalization. In our research work, a parallel study of acoustic and aerodynamic signals reveals the complimentary information within the signals. In addition, dual-channel acoustic studies help to understand the acoustic relationship between the oral and nasal cavities, and show inherent advantages over the single-channel analysis. Based on the derivation and analysis of the dual-channel acoustic properties, automatic detectors of nasalization are developed and successfully tested. The techniques developed in these explorations provide novel instrumental and analysis approaches to possible applications such as phonetic studies of the normal nasalization process, clinical assessment of disordered nasal resonance, and special feature extraction for speech recognition.
|
136 |
A Framework for Speech Recognition using Logistic RegressionBirkenes, Øystein January 2007 (has links)
Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.
|
137 |
Production Knowledge in the Recognition of Dysarthric SpeechRudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual.
This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners.
This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
|
138 |
Production Knowledge in the Recognition of Dysarthric SpeechRudzicz, Frank 31 August 2011 (has links)
Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual.
This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners.
This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take.
|
139 |
Construction and Evaluation of a Large In-Car Speech CorpusTakeda, Kazuya, Fujimura, Hiroshi, Itou, Katsunobu, Kawaguchi, Nobuo, Matsubara, Shigeki, Itakura, Fumitada 03 1900 (has links)
No description available.
|
140 |
Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech RecognitionNAKAGAWA, Seiichi, KITAOKA, Norihide, SAKAI, Makoto 01 March 2008 (has links)
No description available.
|
Page generated in 0.0894 seconds