Global ETD Search

1	The application of classical information retrieval techniques to spoken documents James, David Anthony January 1995 (has links) No description available. 621.3994 Speech recognition; Keyword spotting
2	Resource-dependent acoustic and language modeling for spoken keyword search Chen, I-Fan 27 May 2016 (has links) In this dissertation, three research directions were explored to alleviate two major issues, i.e., the use of incorrect models and training/test condition mismatches, in the modeling frameworks of modern spoken keyword search (KWS) systems. Each of the three research directions, which include (i) data-efficient training processes, (ii) system optimization objectives, and (iii) data augmentation, utilizes different types and amounts of training resources in different ways to ameliorate the two issues of acoustic and language modeling in modern KWS systems. To be more specific, resource-dependent keyword modeling, keyword-boosted sMBR (state-level minimum Bayes risk) training, and multilingual acoustic modeling are proposed and investigated for acoustic modeling in this research. For language modeling, keyword-aware language modeling, discriminative keyword-aware language modeling, and web text augmented language modeling are presented and discussed. The dissertation provides a comprehensive collection of solutions and strategies to the acoustic and language modeling problems in KWS. It also offers insights into the realization of good-performance KWS systems. Experimental results show that the data-efficient training process and data augmentation are the two directions providing the most prominent performance improvement for KWS systems. While modifying system optimization objectives provides smaller yet consistent performance enhancement in KWS systems with different configurations. The effects of the proposed acoustic and language modeling approaches in the three directions are also shown to be additive and can be combined to further improve the overall KWS system performance. Spoken keyword search Keyword spotting Acoustic model Language model Speech recognition
3	Phonemic variability and confusability in pronunciation modeling for automatic speech recognition Karanasou, Panagiota 11 June 2013 (has links) (PDF) This thesis addresses the problems of phonemic variability and confusability from the pronunciation modeling perspective for an automatic speech recognition (ASR) system. In particular, several research directions are investigated. First, automatic grapheme-to- phoneme (g2p) and phoneme-to-phoneme (p2p) converters are developed that generate alternative pronunciations for in-vocabulary as well as out-of-vocabulary (OOV) terms. Since the addition of alternative pronunciation may introduce homophones (or close homophones), there is an increase of the confusability of the system. A novel measure of this confusability is proposed to analyze it and study its relation with the ASR performance. This pronunciation confusability is higher if pronunciation probabilities are not provided and can potentially severely degrade the ASR performance. It should, thus, be taken into account during pronunciation generation. Discriminative training approaches are, then, investigated to train the weights of a phoneme confusion model that allows alternative ways of pronouncing a term counterbalancing the phonemic confusability problem. The objective function to optimize is chosen to correspond to the performance measure of the particular task. In this thesis, two tasks are investigated, the ASR task and the KeywordSpotting (KWS) task. For ASR, an objective that minimizes the phoneme error rate is adopted. For experiments conducted on KWS, the Figure of Merit (FOM), a KWS performance measure, is directly maximized. [INFO:INFO_OH] Computer Science/Other Pronunciation modeling G2p conversion Confusability Discriminative train- ing Speech recognition Keyword spotting
4	A Design and Applications of Mandarin Keyword Spotting System Hou, Cheng-Kuan 11 August 2003 (has links) A Mandarin keyword spotting system based on MFCC, discrete-time HMM and Viterbi algorithm with DTW is proposed in this thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of natural speech patient registration system of Kaohsiung Veterans General Hospital. After the ID number is asked by the computer-dialogue attendant in the registration process, the user can finish all relevant works in one sentence. Functions of searching clinical doctors, making and canceling registration are all built in this system. In a laboratory environment, the correct rate of this speaker-independent patient registration system can reach 97% and all registration process can be completed within 75 seconds. Mel-frequency cepstrum coefficients phrase recognition Dynamic Time Warping Keyword spotting Hidden Markov model
5	Detekce klíčových slov v mluvené řeči / Keyword spotting Zemánek, Tomáš January 2011 (has links) This thesis is aimed on design keyword detector. The work contains a description of the methods that are used for these purposes and design of algorithm for keyword detection. The proposed detector is based on the method of DTW (Dynamic Time Warping). Analysis of the problem was performed on the module programmed in ANSI C, which was created within the thesis. The results of the detector were evaluated using the metrics WER (word error rate) and AUC (area under curve).
6	Utvärdering av Part-of-Speech tagging som metod för identifiering av nyckelord i dialog / Evaluation of Part-of-Speech tagging as a method for identification of keywords in dialogs He, Jeannie, Norström, Matthew January 2019 (has links) Denna studie presenterar Part-of-Speech tagging som metod för identifiering av nyckelord samt en marknadsanalys för en konverserande robot att leda språkkaféer. Resultatet evaluerades med hjälp av svar från enkäter utskickade till 30 anonyma personer med svenska som modersmål. Resultatet visar att metoden är rimlig och kan implementeras i en konverserande robot för att öka dess förståelse av det talade språket som förekommer inom språkkaféer. Marknadsanalysen indikerar att det existerar en marknad för den konverserande roboten. Roboten behöver dock förbättras för att kunna bli en ersättning för mänskliga språkledare inom språkkaféer. / This study presents Part-of-Speech tagging as a method for keyword spotting as well as a market research for a conversational robot to lead a language café. The results are evaluated using the answers from 30 anonymous Swedish native speakers. The results show that the method is plausible and could be implemented in a conversational robot to increase its understanding of the spoken language in a language café. The market research indicates that there is a market for the conversational robot. The conversional robot needs, however, improvements to successfully become a substitute for human language teachers in language cafés. Part-of-Speech Tagging Keyword spotting Language café Robotics. Computer and Information Sciences Data- och informationsvetenskap
7	Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features Karmacharya, Piush January 2012 (has links) Keyword Spotting in general means identifying a keyword in a verbal or written document. In this research a novel approach in designing a simple spoken Keyword Spotting/Recognition system based on Template Matching is proposed, which is different from the Hidden Markov Model based systems that are most widely used today. The system can be used equally efficiently on any language as it does not rely on an underlying language model or grammatical constraints. The proposed method for keyword spotting is based on a modified version of classical Dynamic Time Warping which has been a primary method for measuring the similarity between two sequences varying in time. For processing, a speech signal is divided into small stationary frames. Each frame is represented in terms of a quantized feature vector. Both the keyword and the speech utterance are represented in terms of 1‐dimensional codebook indices. The utterance is divided into segments and the warped distance is computed for each segment and compared against the test keyword. A distortion score for each segment is computed as likelihood measure of the keyword. The proposed algorithm is designed to take advantage of multiple instances of test keyword (if available) by merging the score for all keywords used. The training method for the proposed system is completely unsupervised, i.e., it requires neither a language model nor phoneme model for keyword spotting. Prior unsupervised training algorithms were based on computing Gaussian Posteriorgrams making the training process complex but the proposed algorithm requires minimal training data and the system can also be trained to perform on a different environment (language, noise level, recording medium etc.) by re‐training the original cluster on additional data. Techniques for designing a model keyword from multiple instances of the test keyword are discussed. System performance over variations of different parameters like number of clusters, number of instance of keyword available, etc were studied in order to optimize the speed and accuracy of the system. The system performance was evaluated for fourteen different keywords from the Call - Home and the Switchboard speech corpus. Results varied for different keywords and a maximum accuracy of 90% was obtained which is comparable to other methods using the same time warping algorithms on Gaussian Posteriorgrams. Results are compared for different parameters variation with suggestion of possible improvements. / Electrical and Computer Engineering Engineering Electrical Engineering Keyword Spotting K-means Clustering Segmental Time Warping Template Matching Vector Quantization
8	A novel approach for continuous speech tracking and dynamic time warping : adaptive framing based continuous speech similarity measure and dynamic time warping using Kalman filter and dynamic state model Khan, Wasiq January 2014 (has links) Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods.
9	Optimalizace rozpoznávání řeči pro mobilní zařízení / Optimization of Voice Recognition for Mobile Devices Tomec, Martin January 2010 (has links) This work deals with optimization of keyword spotting algorithms on processor architecture ARM Cortex-A8. At first it describes this architecture and especially the NEON unit for vector computing. In addition it briefly describes keyword spotting algorithms and also there is proposed optimization of these algorithms for described architecture. Main part of this work is implementation of these optimizations and analysis of their impact on performance.
10	A Novel Approach for Continuous Speech Tracking and Dynamic Time Warping. Adaptive Framing Based Continuous Speech Similarity Measure and Dynamic Time Warping using Kalman Filter and Dynamic State Model Khan, Wasiq January 2014 (has links) Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods. / The appendices files are not available online.

Search results