Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
491 |
A Design of Taiwanese Speech Recognition SystemJhu, Hao-fu 24 August 2009 (has links)
This thesis investigates the design and implementation strategies for a Taiwanese speech recognition system. It adopts a 4 plus 1¡]five times¡^recording strategy, where the 1st four recordings are used for speech feature training and the last recording for speech recognition simulation. Mel-frequency cepstrum coefficients and hidden Markov model are used as the feature model and the recognition model respectively. Under the Intel Celeron 2.4 GHz personal computer and Red Hat Linux 9.0 operating system environment, a correct phrase recognition rate of 90% can be reached for a 4200 Taiwanese phrase database.
|
492 |
A Design of English Speech Recognition SystemChen, Yung-ming 24 August 2009 (has links)
This thesis investigates the design and implementation strategies for a English speech recognition system. Two speech inputting methods, the spelling inputting and the reading inputting, are implemented for English word recognition and query. Mel-frequency cepstrum coefficients, linear predicted cepstrum coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. Under the Pentium 1.6 GHz personal computer and Ubuntu 8.04 operating system environment, a 95% correct recognition rate can be obtained for a 110 thousand English word database by the spelling inputting method; and a 93% correct recognition rate can be achieved for a 1,500 English word database by the reading inputting method. The average computation time for each word using either inputting method is about 1.5 seconds.
|
493 |
Auditory Based Modification of MFCC Feature Extraction for Robust Automatic Speech RecognitionChiou, Sheng-chiuan 01 September 2009 (has links)
The human auditory perception system is much more noise-robust than any state-of theart
automatic speech recognition (ASR) system. It is expected that the noise-robustness of
speech feature vectors may be improved by employing more human auditory functions in the
feature extraction procedure.
Forward masking is a phenomenon of human auditory perception, that a weaker sound
is masked by the preceding stronger masker. In this work, two human auditory mechanisms,
synaptic adaptation and temporal integration are implemented by filter functions and incorporated
to model forward masking into MFCC feature extraction. A filter optimization algorithm
is proposed to optimize the filter parameters.
The performance of the proposed method is evaluated on Aurora 3 corpus, and the procedure
of training/testing follows the standard setting provided by the Aurora 3 task. The
synaptic adaptation filter achieves relative improvements of 16.6% over the baseline. The
temporal integration and modified temporal integration filter achieve relative improvements
of 21.6% and 22.5% respectively. The combination of synaptic adaptation with each of temporal
integration filters results in further improvements of 26.3% and 25.5%. Applying the
filter optimization improves the synaptic adaptation filter and two temporal integration filters,
results in the 18.4%, 25.2%, 22.6% improvements respectively. The performance of the
combined-filters models are also improved, the relative improvement are 26.9% and 26.3%.
|
494 |
Improving the efficacy of automated sign language practice toolsBrashear, Helene Margaret 07 July 2010 (has links)
The CopyCat project is an interdisciplinary effort to create a set of computer-aided language learning tools for deaf children. The CopyCat games allow children to interact with characters using American Sign Language (ASL). Through Wizard of Oz pilot studies we have developed a set of games, shown their efficacy in improving young deaf children's language and memory skills, and collected a large corpus of signing examples. Our previous implementation of the automatic CopyCat games uses automatic sign language recognition and verification in the infrastructure of a memory repetition and phrase verification task.
The goal of my research is to expand the automatic sign language system to transition the CopyCat games to include the flexibility of a dialogue system. I have created a labeling ontology from analysis of the CopyCat signing corpus, and I have used the ontology to describe the contents of the CopyCat data set. This ontology was used to change and improve the automatic sign language recognition system and to add flexibility to language use in the automatic game.
|
495 |
The use of prosodic features in Chinese speech recognition and spoken language processing /Wong, Jimmy Pui Fung. January 2003 (has links)
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
|
496 |
The adoption of interactive voice response for feeding scheme programme monitoring.Qwabe, Olwethu. January 2014 (has links)
M. Tech. Business Information Systems / The Department of Education should be contributing to the South African government's objective to provide a better life for all. However, the provision of education to all is hampered by the fact that a significant majority of the South African population is plagued by high levels of poverty resulting in learners attending school without having had a nutritious meal. Consequently, the provision of food in South African schools, as a lead project of the Reconstruction and Development Programme, referred to as the 'feeding scheme', was introduced. This project aimed to improve both health and education by fighting malnutrition and improving the ability of learners to concentrate during lessons. The South African government provides the funds for the school feeding programme for learners from primary to secondary schools and the Department of Education spends a large amount of money on this programme nationally. However, there is no precise data showing how successful the feeding programme is. In order for the Department of Education to meet its objectives, it is recommended that an efficient system be developed for keeping records of all the reports. It is thus critical to explore the potential use of technologies, such as interactive voice response systems. The interactive voice response solutions have the potential to assist the Department of Education in monitoring and evaluating the school feeding programme in timely, accurate and reliable ways. This research aims to evaluate how this interactive voice response system can be implemented to effectively enhance the monitoring of the feeding programme in South African schools.
|
497 |
Real-time recognition of monosyllabic speech (Cantonese) using analogue filtersLuk, Wing-kin., 陸榮堅. January 1977 (has links)
published_or_final_version / Electrical Engineering / Master / Master of Philosophy
|
498 |
An Evaluation Framework for Adaptive User InterfaceNoriega Atala, Enrique January 2014 (has links)
With the rise of powerful mobile devices and the broad availability of computing power, Automatic Speech Recognition is becoming ubiquitous. A flawless ASR system is still far from existence. Because of this, interactive applications that make use of ASR technology not always recognize speech perfectly, when not, the user must be engaged to repair the transcriptions. We explore a rational user interface that uses of machine learning models to make its best effort in presenting the best repair strategy available to reduce the time in spent the interaction between the user and the system as much as possible. A study is conducted to determine how different candidate policies perform and results are analyzed. After the analysis, the methodology is generalized in terms of a decision theoretical framework that can be used to evaluate the performance of other rational user interfaces that try to optimize an expected cost or utility.
|
499 |
Deep Neural Network Acoustic Models for ASRMohamed, Abdel-rahman 01 April 2014 (has links)
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems have evolved from discriminating among isolated digits to recognizing telephone-quality, spontaneous speech, allowing for a growing number of practical applications in various sectors. Nevertheless, there are still serious challenges facing ASR which require major improvement in almost every stage of the speech recognition process. Until very recently, the standard approach to ASR had remained largely unchanged for many years. It used Hidden Markov Models (HMMs) to model the sequential structure of speech signals, with each HMM state using a mixture of diagonal covariance Gaussians (GMM) to model a spectral representation of the sound wave.
This thesis describes new acoustic models based on Deep Neural Networks (DNN) that have begun to replace GMMs. For ASR, the deep structure of a DNN as well as its distributed representations allow for better generalization of learned features to new situations, even when only small amounts of training data are available. In addition, DNN acoustic models scale well to large vocabulary tasks significantly improving upon the best previous systems.
Different input feature representations are analyzed to determine which one is more suitable for DNN acoustic models. Mel-frequency cepstral coefficients (MFCC) are inferior to log Mel-frequency spectral coefficients (MFSC) which help DNN models marginalize out speaker-specific information while focusing on discriminant phonetic features. Various speaker adaptation techniques are also introduced to further improve DNN performance.
Another deep acoustic model based on Convolutional Neural Networks (CNN) is also proposed. Rather than using fully connected hidden layers as in a DNN, a CNN uses a pair of convolutional and pooling layers as building blocks. The convolution operation scans the frequency axis using a learned local spectro-temporal filter while in the pooling layer a maximum operation is applied to the learned features utilizing the smoothness of the input MFSC features to eliminate speaker variations expressed as shifts along the frequency axis in a way similar to vocal tract length normalization (VTLN) techniques.
We show that the proposed DNN and CNN acoustic models achieve significant improvements over GMMs on various small and large vocabulary tasks.
|
500 |
Deep Neural Network Acoustic Models for ASRMohamed, Abdel-rahman 01 April 2014 (has links)
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems have evolved from discriminating among isolated digits to recognizing telephone-quality, spontaneous speech, allowing for a growing number of practical applications in various sectors. Nevertheless, there are still serious challenges facing ASR which require major improvement in almost every stage of the speech recognition process. Until very recently, the standard approach to ASR had remained largely unchanged for many years. It used Hidden Markov Models (HMMs) to model the sequential structure of speech signals, with each HMM state using a mixture of diagonal covariance Gaussians (GMM) to model a spectral representation of the sound wave.
This thesis describes new acoustic models based on Deep Neural Networks (DNN) that have begun to replace GMMs. For ASR, the deep structure of a DNN as well as its distributed representations allow for better generalization of learned features to new situations, even when only small amounts of training data are available. In addition, DNN acoustic models scale well to large vocabulary tasks significantly improving upon the best previous systems.
Different input feature representations are analyzed to determine which one is more suitable for DNN acoustic models. Mel-frequency cepstral coefficients (MFCC) are inferior to log Mel-frequency spectral coefficients (MFSC) which help DNN models marginalize out speaker-specific information while focusing on discriminant phonetic features. Various speaker adaptation techniques are also introduced to further improve DNN performance.
Another deep acoustic model based on Convolutional Neural Networks (CNN) is also proposed. Rather than using fully connected hidden layers as in a DNN, a CNN uses a pair of convolutional and pooling layers as building blocks. The convolution operation scans the frequency axis using a learned local spectro-temporal filter while in the pooling layer a maximum operation is applied to the learned features utilizing the smoothness of the input MFSC features to eliminate speaker variations expressed as shifts along the frequency axis in a way similar to vocal tract length normalization (VTLN) techniques.
We show that the proposed DNN and CNN acoustic models achieve significant improvements over GMMs on various small and large vocabulary tasks.
|
Page generated in 0.199 seconds