Global ETD Search

301	Learning pronunciation variation : A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition Amdal, Ingunn January 2002 (has links) To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech. In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling. For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning. For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style. pronunciation variation lexical modelling automatic speech recognition non-native speech Electronics Elektronik
302	A Study of the Automatic Speech Recognition Process and Speaker Adaptation Stokes-Rees, Ian James January 2000 (has links) This thesis considers the entire automated speech recognition process and presents a standardised approach to LVCSR experimentation with HMMs. It also discusses various approaches to speaker adaptation such as MLLR and multiscale, and presents experimental results for cross-task speaker adaptation. An analysis of training parameters and data sufficiency for reasonable system performance estimates are also included. It is found that Maximum Likelihood Linear Regression (MLLR) supervised adaptation can result in 6% reduction (absolute) in word error rate given only one minute of adaptation data, as compared with an unadapted model set trained on a different task. The unadapted system performed at 24% WER and the adapted system at 18% WER. This is achieved with only 4 to 7 adaptation classes per speaker, as generated from a regression tree. Electrical & Computer Engineering automatic speech recognition speaker adaptation HTK HMM MLLR LVCSR
303	A Study of the Automatic Speech Recognition Process and Speaker Adaptation Stokes-Rees, Ian James January 2000 (has links) This thesis considers the entire automated speech recognition process and presents a standardised approach to LVCSR experimentation with HMMs. It also discusses various approaches to speaker adaptation such as MLLR and multiscale, and presents experimental results for cross-task speaker adaptation. An analysis of training parameters and data sufficiency for reasonable system performance estimates are also included. It is found that Maximum Likelihood Linear Regression (MLLR) supervised adaptation can result in 6% reduction (absolute) in word error rate given only one minute of adaptation data, as compared with an unadapted model set trained on a different task. The unadapted system performed at 24% WER and the adapted system at 18% WER. This is achieved with only 4 to 7 adaptation classes per speaker, as generated from a regression tree. Electrical & Computer Engineering automatic speech recognition speaker adaptation HTK HMM MLLR LVCSR
304	A Design of Multi-session Text-independent Digital Camcorder Audio-Video Database for Speaker Recognition Chen, Chun-chi 05 September 2008 (has links) In this thesis, an audio-video database for speaker recognition is constructed using a digital camcorder. Motion pictures of fifteen hundred speakers are recorded in three different sessions in the database. For each speaker, 20 still images per session are also derived from the video data. It is hoped that this database can provide an appropriate training and testing mechanism for person identification using both voice and face features. Speaker Recognition Automatic Speech Recognition System Text-to-Speech System Biometrics
305	Auditory Based Modification of MFCC Feature Extraction for Robust Automatic Speech Recognition Chiou, Sheng-chiuan 01 September 2009 (has links) The human auditory perception system is much more noise-robust than any state-of theart automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature vectors may be improved by employing more human auditory functions in the feature extraction procedure. Forward masking is a phenomenon of human auditory perception, that a weaker sound is masked by the preceding stronger masker. In this work, two human auditory mechanisms, synaptic adaptation and temporal integration are implemented by filter functions and incorporated to model forward masking into MFCC feature extraction. A filter optimization algorithm is proposed to optimize the filter parameters. The performance of the proposed method is evaluated on Aurora 3 corpus, and the procedure of training/testing follows the standard setting provided by the Aurora 3 task. The synaptic adaptation filter achieves relative improvements of 16.6% over the baseline. The temporal integration and modified temporal integration filter achieve relative improvements of 21.6% and 22.5% respectively. The combination of synaptic adaptation with each of temporal integration filters results in further improvements of 26.3% and 25.5%. Applying the filter optimization improves the synaptic adaptation filter and two temporal integration filters, results in the 18.4%, 25.2%, 22.6% improvements respectively. The performance of the combined-filters models are also improved, the relative improvement are 26.9% and 26.3%. forward masking auditory model syanptic adaptation temporal integration noise robust ASR automatic speech recognition
306	A detection-based pattern recognition framework and its applications Ma, Chengyuan 06 April 2010 (has links) The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future. Speech recognition Detection-based Evidence fusion Pattern recognition systems Automatic speech recognition Digital video
307	Improving the efficacy of automated sign language practice tools Brashear, Helene Margaret 07 July 2010 (has links) The CopyCat project is an interdisciplinary effort to create a set of computer-aided language learning tools for deaf children. The CopyCat games allow children to interact with characters using American Sign Language (ASL). Through Wizard of Oz pilot studies we have developed a set of games, shown their efficacy in improving young deaf children's language and memory skills, and collected a large corpus of signing examples. Our previous implementation of the automatic CopyCat games uses automatic sign language recognition and verification in the infrastructure of a memory repetition and phrase verification task. The goal of my research is to expand the automatic sign language system to transition the CopyCat games to include the flexibility of a dialogue system. I have created a labeling ontology from analysis of the CopyCat signing corpus, and I have used the ontology to describe the contents of the CopyCat data set. This ontology was used to change and improve the automatic sign language recognition system and to add flexibility to language use in the automatic game. Automatic sign language recognition ASL Assistive technology Sign language Hidden Markov models Automatic speech recognition
308	The use of prosodic features in Chinese speech recognition and spoken language processing / Wong, Jimmy Pui Fung. January 2003 (has links) Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
309	The adoption of interactive voice response for feeding scheme programme monitoring. Qwabe, Olwethu. January 2014 (has links) M. Tech. Business Information Systems / The Department of Education should be contributing to the South African government's objective to provide a better life for all. However, the provision of education to all is hampered by the fact that a significant majority of the South African population is plagued by high levels of poverty resulting in learners attending school without having had a nutritious meal. Consequently, the provision of food in South African schools, as a lead project of the Reconstruction and Development Programme, referred to as the 'feeding scheme', was introduced. This project aimed to improve both health and education by fighting malnutrition and improving the ability of learners to concentrate during lessons. The South African government provides the funds for the school feeding programme for learners from primary to secondary schools and the Department of Education spends a large amount of money on this programme nationally. However, there is no precise data showing how successful the feeding programme is. In order for the Department of Education to meet its objectives, it is recommended that an efficient system be developed for keeping records of all the reports. It is thus critical to explore the potential use of technologies, such as interactive voice response systems. The interactive voice response solutions have the potential to assist the Department of Education in monitoring and evaluating the school feeding programme in timely, accurate and reliable ways. This research aims to evaluate how this interactive voice response system can be implemented to effectively enhance the monitoring of the feeding programme in South African schools. Speech processing systems. Automatic speech recognition.
310	Real-time recognition of monosyllabic speech (Cantonese) using analogue filters Luk, Wing-kin., 陸榮堅. January 1977 (has links) published_or_final_version / Electrical Engineering / Master / Master of Philosophy Automatic speech recognition. Acoustic filters.

Search results