Global ETD Search

371	Speech enhancement using microphone array Cho, Jaeyoun 22 November 2005 (has links) No description available. speech enhancement speech recognition spectral subtraction beamforming microphone array psychoacoustics
372	Deriving Novel Posterior Feature Spaces For Conditional Random Field - Based Phone Recognition Mohapatra, Prateeti 31 August 2009 (has links) No description available. Computer Science Speech Recognition Feature Combination Suprasegmental Information
373	Implementation of a Connected Digit Recognizer Using Continuous Hidden Markov Modeling Srichai, Panaithep Albert 02 October 2006 (has links) This thesis describes the implementation of a speaker dependent connected-digit recognizer using continuous Hidden Markov Modeling (HMM). The speech recognition system was implemented using MATLAB and on the ADSP-2181, a digital signal processor manufactured by Analog Devices. Linear predictive coding (LPC) analysis was first performed on a speech signal to model the characteristics of the vocal tract filter. A 7 state continuous HMM with 4 mixture density components was used to model each digit. The Viterbi reestimation method was primarily used in the training phase to obtain the parameters of the HMM. Viterbi decoding was used for the recognition phase. The system was first implemented as an isolated word recognizer. Recognition rates exceeding 99% were obtained on both the MATLAB and the ADSP-2181 implementations. For continuous word recognition, several algorithms were implemented and compared. Using MATLAB, recognition rates exceeding 90% were obtained. In addition, the algorithms were implemented on the ADSP-2181 yielding recognition rates comparable to the MATLAB implementation. / Master of Science connected-digit recognition HMM hidden Markov models speech recognition
374	Design and Development of a Metadata-Driven Search Tool for use with Digital Recordings Radke, Annemarie Katherine 19 June 2019 (has links) It is becoming more common for researchers to use existing recordings as a source for data rather than to generate new media for research. Prior to the examination of recordings, data must be extracted from the recordings and the recordings must be described with metadata to allow users to search for the recordings and to search information within the recordings. The purpose of this small-scale study was to develop a web based search tool that will permit a comprehensive search of spoken information within a collection of existing digital recordings archived in an open-access digital repository. The study is significant to the field of instructional design and technology (IDT) as the digital recordings used in this study are interviews, which contain personal histories and insight from leaders and scholars who have influenced and advanced the field of IDT. This study explored and used design and development research methods for the development of a search tool for use with digital video interviews. The study applied speech recognition technology, tool prototypes, usability testing, expert review, and the skills of a program developer. Results from the study determined that the produced tool provided a more comprehensive and flexible search for users to locate content from within AECT Legends and Legacies Project video interviews. / Doctor of Philosophy / It is becoming more common for researchers to use existing recordings in studies. Prior to examination, the information about the recordings and within the recordings must be determined to allow users the ability to search information. The purpose of this small-scale study was to develop an online search tool that allows users to locate spoken words within a video interview. The study is important to the field of instructional design and technology (IDT) as the video interviews used in this study contain experience and insight from people who have advanced the field of IDT. Using current and free technology, this study developed a practical search tool to search information from AECT Legends and Legacies Project video interviews. Keyword Search Instructional Design and Technology Interviews Speech Recognition Technology Transcription
375	The effects of recognition accuracy and vocabulary size of a speech recognition system on task performance and user acceptance Casali, Sherry P. 22 June 2010 (has links) Automatic speech recognition systems have at last advanced to the state that they are now a feasible alternative for human-machine communication in selected applications. As such, research efforts are now beginning to focus on characteristics of the human, the recognition device, and the interface which optimize the system performance, rather than the previous trend of determining factors affecting recognizer performance alone. This study investigated two characteristics of the recognition device, the accuracy level at which it recognizes speech, and the vocabulary size of the recognizer as a percent of task vocabulary size to determine their effects on system performance. In addition, the study considered one characteristic of the user, age. Briefly, subjects performed a data entry task under each of the treatment conditions. Task completion time and the number of errors remaining at the end of each session were recorded. After each session, subjects rated the recognition device used as to its acceptability for the task. The accuracy level at which the recognizer was performing significantly influenced the task completion time as well as the user's acceptability ratings, but had only a small effect on the number of errors left uncorrected. The available vocabulary size also significantly affected the task completion time; however, its effect on the final error rate and on the acceptability ratings was negligible. The age of the subject was also found to influence both objective and subjective measures. Older subjects in general required longer times to complete the tasks; however, they consistently rated the speech input systems more favorably than the younger subjects. / Master of Science LD5655.V855 1988.C382 Automatic speech recognition Speech processing systems
376	Reconhecimento de fala para navegação em aplicativos móveis para português brasileiro / Brazilian Portuguese Speech Recognition for Navigation on Mobile Device Applications Triana Gomez, Edwin Miguel 17 June 2011 (has links) O objetivo do presente trabalho de pesquisa é reduzir o nível de atenção requerido para o uso do sistema Borboleta por meio de reconhecimento de fala na navegação através das funcionalidades do sistema, permitindo ao profissional dar maior atenção ao paciente. A metodologia de desenvolvimento do projeto inclui uma revisão bibliográfica para definir o estado da arte da área, uma pesquisa sobre o software disponível para reconhecimento de fala, uma coleta de dados dos comandos do sistema em português brasileiro para treinar e testar o sistema, uma etapa de projeção e desenvolvimento para definir a arquitetura de integração com o Borboleta, e uma fase de testes para medir a precisão do sistema e seus níveis de usabilidade e aceitação por parte do usuário. / The current document presents research that addresses the goal of reducing the user attention level required by Borboleta operation by providing speech recognition capabilities to augment navigation through the software functions, allowing the professional to pay more attention to the patient. The project methodology is composed of a bibliography revision to establish the state-of-the-art of the field, a review of available speech recognition software, data collection of Brazilian utterances to train and test the system, a design and development stage that defined the system architecture and integration with Borboleta and a testing process to measure the system accuracy, its usability and acceptance level. borboleta borboleta dispositivos movéis interfaces multimodais mobile computing multimodal interfaces pocketsphinx pocketsphinx reconhecimento de fala speech recognition speech recognition telehealth telesaude
377	Bio-inspired noise robust auditory features Javadi, Ailar 12 June 2012 (has links) The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti- fication and smoothing) and down-sampling for each filter bank before being compressed. DCT is then applied to the results of all filter banks to produce features. The Hidden- Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech recognition given the features we have extracted. In this work, we investigate the role of filter types, window size, spatial derivative, rectification types, smoothing, down- sampling and compression and compared the final results to state-of-the-art Mel-frequency cepstral coefficients (MFCC). A series of conclusions and insights are provided for each step of the process. The goal of this work has not been to outperform MFCCs; however, we have shown that by changing the compression type from log compression to 0.07 root compression we are able to outperform MFCCs for all noisy conditions. Speech recognition MFCCs Noise-robust features Feature extraction Biologically-inspired computing Automatic speech recognition Computational auditory scene analysis
378	Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System Jeon, Woojay 13 November 2006 (has links) It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ideas on how to interpret the model, and reveal some of the advantages of its dimension-expansion that may be potentially used to improve existing speech processing and recognition methods. This is done by statistically analyzing the neural responses to various classes of speech signals and forming empirical conjectures on how cognitive information is encoded in a category-dependent manner. We also establish a theoretical framework that shows how noise and signal can be separated in the dimension-expanded cortical space. Finally, we develop new feature selection and pattern recognition methods to exploit the category-dependent encoding of noise-robust cognitive information in the cortical response. Category-dependent features are proposed as features that "specialize" in discriminating specific sets of classes, and as a natural way of incorporating them into a Bayesian decision framework, we propose methods to construct hierarchical classifiers that perform decisions in a two-stage process. Phoneme classification tasks using the TIMIT speech database are performed to quantitatively validate all developments in this work, and the results encourage future work in exploiting high-dimensional data with category(or class)-dependent features for improved classification or detection. Speech processing Speech recognition Feature selection Pattern recognition Speech analysis Auditory model Automatic speech recognition Auditory cortex
379	Reconhecimento de fala para navegação em aplicativos móveis para português brasileiro / Brazilian Portuguese Speech Recognition for Navigation on Mobile Device Applications Edwin Miguel Triana Gomez 17 June 2011 (has links) O objetivo do presente trabalho de pesquisa é reduzir o nível de atenção requerido para o uso do sistema Borboleta por meio de reconhecimento de fala na navegação através das funcionalidades do sistema, permitindo ao profissional dar maior atenção ao paciente. A metodologia de desenvolvimento do projeto inclui uma revisão bibliográfica para definir o estado da arte da área, uma pesquisa sobre o software disponível para reconhecimento de fala, uma coleta de dados dos comandos do sistema em português brasileiro para treinar e testar o sistema, uma etapa de projeção e desenvolvimento para definir a arquitetura de integração com o Borboleta, e uma fase de testes para medir a precisão do sistema e seus níveis de usabilidade e aceitação por parte do usuário. / The current document presents research that addresses the goal of reducing the user attention level required by Borboleta operation by providing speech recognition capabilities to augment navigation through the software functions, allowing the professional to pay more attention to the patient. The project methodology is composed of a bibliography revision to establish the state-of-the-art of the field, a review of available speech recognition software, data collection of Brazilian utterances to train and test the system, a design and development stage that defined the system architecture and integration with Borboleta and a testing process to measure the system accuracy, its usability and acceptance level. borboleta dispositivos movéis interfaces multimodais pocketsphinx reconhecimento de fala speech recognition telesaude borboleta mobile computing multimodal interfaces pocketsphinx speech recognition telehealth
380	Development of robust language models for speech recognition of under-resourced language Sindana, Daniel January 2020 (has links) Thesis (M.Sc.(Computer Science )) -- University of Limpopo, 2020 / Language modelling (LM) work for under-resourced languages that does not consider most linguistic information inherent in a language produces language models that in adequately represent the language, thereby leading to under-development of natural language processing tools and systems such as speech recognition systems. This study investigated the influence that the orthography (i.e., writing system) of a lan guage has on the quality and/or robustness of the language models created for the text of that language. The unique conjunctive and disjunctive writing systems of isiN debele (Ndebele) and Sepedi (Pedi) were studied. The text data from the LWAZI and NCHLT speech corpora were used to develop lan guage models. The LM techniques that were implemented included: word-based n gram LM, LM smoothing, LM linear interpolation, and higher-order n-gram LM. The toolkits used for development were: HTK LM, SRILM, and CMU-Cam SLM toolkits. From the findings of the study – found on text preparation, data pooling and sizing, higher n-gram models, and interpolation of models – it is concluded that the orthogra phy of the selected languages does have effect on the quality of the language models created for their text. The following recommendations are made as part of LM devel opment for the concerned languages. 1) Special preparation and normalisation of the text data before LM development – paying attention to within sentence text markers and annotation tags that may incorrectly form part of sentences, word sequences, and n-gram contexts. 2) Enable interpolation during training. 3) Develop pentagram and hexagram language models for Pedi texts, and trigrams and quadrigrams for Ndebele texts. 4) Investigate efficient smoothing method for the different languages, especially for different text sizes and different text domains / National Research Foundation (NRF) Telkom University of Limpopo Language modelling Natural language processing Automatic speech recognition Under-resourced languages Robust control Automatic speech recognition Speech perception

Search results