Global ETD Search

81	A Framework for Speech Recognition using Logistic Regression Birkenes, Øystein January 2007 (has links) <p>Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.</p><p>In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. </p><p>Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.</p><p>A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.</p> Automatic speech recognition Signal processing Signalbehandling
82	A VHDL description of speech recognition front-end Xiao, Xin 17 May 2001 (has links) This thesis investigates an implementation of speech recognition front-end. It is an application specific integrated circuit (ASIC) solution. A Mel Cepstrum algorithm is implemented for the feature extraction. We present a new mixed split-radix and radix-2 Fast Fourier Transform (FFT) algorithm, which can effectively minimize the number of complex multiplications in the speech recognition front-end. A prime length discrete cosine transform (DCT) is done effectively through the use of two shorter length correlations. The algorithm results in a circular correlation structure that is suitable for a constant coefficient multiplication and shift-register realization. The multiplicative normalization algorithm is used for square root function. Radix-2 algorithm is used in the first 5 stages and radix-4 algorithm is used in the other stages to speed up the convergence. A similar normalization algorithm is present for natural logarithm. / Graduation date: 2002 Automatic speech recognition Application specific integrated circuits
83	Leveraging multimodal redundancy for dynamic learning, with SHACER, a speech and handwriting recognizer / Kaiser, Edward C. January 2007 (has links) Thesis (Ph.D.) OGI School of Science & Engineering at OHSU, April 2007. / Includes bibliographical references (leaves 189-206).
84	A Framework for Speech Recognition using Logistic Regression Birkenes, Øystein January 2007 (has links) Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task. Automatic speech recognition Signal processing Signalbehandling
85	Production Knowledge in the Recognition of Dysarthric Speech Rudzicz, Frank 31 August 2011 (has links) Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take. Dysarthria Automatic speech recognition Articulatory models 0800
86	Production Knowledge in the Recognition of Dysarthric Speech Rudzicz, Frank 31 August 2011 (has links) Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take. Dysarthria Automatic speech recognition Articulatory models 0800
87	A Keyword Based Interactive Speech Recognition System for Embedded Applications Castro Ceron, Ivan Francisco, Garcia Badillo, Andrea Graciela January 2011 (has links) Speech recognition has been an important area of research during the past decades. The usage of automatic speech recognition systems is rapidly increasing among different areas, such as mobile telephony, automotive, healthcare, robotics and more. However, despite the existence of many speech recognition systems, most of them use platform specific and non-publicly available software. Nevertheless, it is possible to develop speech recognition systems using already existing open source technology. The aim of this master's thesis is to develop an interactive and speaker independent speech recognition system. The system shall be able to identify predetermined keywords from incoming live speech and in response, play audio files with related information. Moreover, the system shall be able to provide a response even if no keyword was identified. For this project, the system was implemented using PocketSphinx, a speech recognition library, part of the open source Sphinx technology by the Carnegie Mellon University. During the implementation of this project, the automation of different steps of the process, was a key factor for a successful completion. This automation consisted on the development of different tools for the creation of the language model and the dictionary, two important components of the system. Similarly, the audio files to be played after identifying a keyword, as well as the evaluation of the system's performance, were fully automated. The tests run show encouraging results and demonstrate that the system is a feasible solution that could be implemented and tested in a real embedded application. Despite the good results, possible improvements can be implemented, such as the creation of a different phonetic dictionary to support different languages. Automatic Speech Recognition PocketSphinx Embedded Systems
88	A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition / Yin, Yan. January 2008 (has links) Thesis (M.Sc.)--York University, 2008. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 101-109). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR45978
89	A study on acoustic modeling and adaptation in HMM-based speech recognition Ma, Bin, January 2000 (has links) Thesis (Ph. D.)--University of Hong Kong, 2001. / Includes bibliographical references (leaves 103-112).
90	Using commercial-off-the-shelf speech recognition software for conning U.S. warships Tamez, Dorothy J. January 2003 (has links) (PDF) Thesis (M.S. in Information Technology Management)--Naval Postgraduate School, June 2003. / Thesis advisor(s): Monique P. Fargues, Russell Gottfried. Includes bibliographical references (p. 71-74). Also available online.

Search results