Global ETD Search

141	A Keyword Based Interactive Speech Recognition System for Embedded Applications Castro Ceron, Ivan Francisco, Garcia Badillo, Andrea Graciela January 2011 (has links) Speech recognition has been an important area of research during the past decades. The usage of automatic speech recognition systems is rapidly increasing among different areas, such as mobile telephony, automotive, healthcare, robotics and more. However, despite the existence of many speech recognition systems, most of them use platform specific and non-publicly available software. Nevertheless, it is possible to develop speech recognition systems using already existing open source technology. The aim of this master's thesis is to develop an interactive and speaker independent speech recognition system. The system shall be able to identify predetermined keywords from incoming live speech and in response, play audio files with related information. Moreover, the system shall be able to provide a response even if no keyword was identified. For this project, the system was implemented using PocketSphinx, a speech recognition library, part of the open source Sphinx technology by the Carnegie Mellon University. During the implementation of this project, the automation of different steps of the process, was a key factor for a successful completion. This automation consisted on the development of different tools for the creation of the language model and the dictionary, two important components of the system. Similarly, the audio files to be played after identifying a keyword, as well as the evaluation of the system's performance, were fully automated. The tests run show encouraging results and demonstrate that the system is a feasible solution that could be implemented and tested in a real embedded application. Despite the good results, possible improvements can be implemented, such as the creation of a different phonetic dictionary to support different languages. Automatic Speech Recognition PocketSphinx Embedded Systems
142	The Algorithms of Speech Recognition : programming and simulating in MATLAB Yang, Tingxiao January 2012 (has links) The aim of this thesis work is to investigate the algorithms of speech recognition. The author programmed and simulated the designed systems for algorithms of speech recognition in MATLAB. There are two systems designed in this thesis. One is based on the shape information of the cross-correlation plotting. The other one is to use the Wiener Filter to realize the speech recognition. The simulations of the programmed systems in MATLAB are accomplished by using the microphone to record the speaking words. After running the program in MATLAB, MATLAB will ask people to record the words three times. The first and second recorded words are different words which will be used as the reference signals in the designed systems. The third recorded word is the same word as the one of the first two recorded words. After recording words, the words will become the signals’ information which will be sampled and stored in MATLAB. Then MATLAB should be able to give the judgment that which word is recorded at the third time compared with the first two reference words according to the algorithms programmed in MATLAB. The author invited different people from different countries to test the designed systems. The results of simulations for both designed systems show that the designed systems both work well when the first two reference recordings and the third time recording are recorded from the same person. But the designed systems all have the defects when the first two reference recordings and the third time recording are recorded from the different people. However, if the testing environment is quiet enough and the speaker is the same person for three time recordings, the successful probability of the speech recognition is approach to 100%. Thus, the designed systems actually work well for the basical speech recognition. speech recognition MATLAB cross-correlation Wienner Filter
143	A Discriminative Locally-Adaptive Nearest Centroid Classifier for Phoneme Classification Sun, Yong-Peng January 2012 (has links) Phoneme classification is a key area of speech recognition. Phonemes are the basic modeling units in modern speech recognition and they are the constructive units of words. Thus, being able to quickly and accurately classify phonemes that are input to a speech-recognition system is a basic and important step towards improving and eventually perfecting speech recognition as a whole. Many classification approaches currently exist that can be applied to the task of classifying phonemes. These techniques range from simple ones such as the nearest centroid classifier to complex ones such as support vector machine. Amongst the existing classifiers, the simpler ones tend to be quicker to train but have lower accuracy, whereas the more complex ones tend to be higher in accuracy but are slower to train. Because phoneme classification involves very large datasets, it is desirable to have classifiers that are both quick to train and are high in accuracy. The formulation of such classifiers is still an active ongoing research topic in phoneme classification. One paradigm in formulating such classifiers attempts to increase the accuracies of the simpler classifiers with minimal sacrifice to their running times. The opposite paradigm attempts to increase the training speeds of the more complex classifiers with minimal sacrifice to their accuracies. The objective of this research is to develop a new centroid-based classifier that builds upon the simpler nearest centroid classifier by incorporating a new discriminative locally-adaptive training procedure developed from recent advances in machine learning. This new classifier, which is referred to as the discriminative locally-adaptive nearest centroid (DLANC) classifier, achieves much higher accuracies as compared to the nearest centroid classifier whilst having a relatively low computational complexity and being able to scale up to very large datasets. classifier speech recognition Electrical and Computer Engineering
144	Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition Wu, Kuo-hao 25 August 2010 (has links) In this thesis, a novel technique based on the empirical mode decomposition (EMD) methodology is proposed and examined for the noise-robustness of automatic speech recognition systems. The EMD analysis is a generalization of the Fourier analysis for processing nonlinear and non-stationary time functions, in our case, the speech feature sequences. We use the intrinsic mode functions (IMF), which include the sinusoidal functions as special cases, obtained from the EMD analysis in the post-processing of the log energy feature. We evaluate the proposed method on Aurora 2.0 and Aurora 3.0 databases. On Aurora 2.0, we obtain a 44.9% overall relative improvement over the baseline for the mismatched (clean-training) tasks. The results show an overall improvement of 49.5% over the baseline for Aurora 3.0 on the high-mismatch tasks. It shows that our proposed method leads to significant improvement. noise robustness empirical mode decomposition speech recognition
145	Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training TAKEDA, Kazuya, NAKAGAWA, Seiichi, HATTORI, Yuya, KITAOKA, Norihide, SAKAI, Makoto 01 February 2010 (has links) No description available. discriminative training feature extraction speech recognition
146	Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition TAKEDA, Kazuya, KITAOKA, Norihide, SAKAI, Makoto 01 May 2010 (has links) No description available. multidimensional signal processing feature extraction speech recognition
147	Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria TAKEDA, Kazuya, KITAOKA, Norihide, SAKAI, Makoto 01 July 2010 (has links) No description available. Bayes error dimensionality reduction speech recognition
148	A Design of Mandarin Keyword Spotting System Wang, Yi-Lii 07 February 2003 (has links) A Mandarin keyword spotting system based on LPC, VQ, discrete-time HMM and Viterbi algorithm is proposed in the thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of Taiwan Railway Natural Language Reservation System. In the reservation process, five questions: name and ID number, departure station, destination station, train type and number of tickets, and time schedule are asked by the computer-dialogue attendant. Following by the customer¡¦s speech confirmation, electronic tickets can be correctly issued and printed within 90 seconds in a laboratory environment. Speech Recognition LPC Hidden Markov Model Cepstrum
149	A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition / Yin, Yan. January 2008 (has links) Thesis (M.Sc.)--York University, 2008. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 101-109). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR45978
150	A study on acoustic modeling and adaptation in HMM-based speech recognition Ma, Bin, January 2000 (has links) Thesis (Ph. D.)--University of Hong Kong, 2001. / Includes bibliographical references (leaves 103-112).

Search results