Global ETD Search

1	Robust Formant tracking for Continuous Speech with Speaker Variability / Robust Formant tracking for Continuous Speech Mustafa, Kamran 12 1900 (has links) Exposure to loud sounds can cause damage to the inner ear, leading to degradation of the neural response to speech and to formant frequencies in particular. This may result in decreased intelligibility of speech. An amplification scheme for hearing aids, called Contrast Enhanced Frequency Shaping (CEFS), may improve speech perception for ears with sound-induced hearing damage. CEFS takes into account across-frequency distortions introduced by the impaired ear and requires accurate and robust formant frequency estimates to allow dynamic, speech-spectrum-dependent amplification of speech in hearing aids. Several algorithms have been developed for extracting the formant information from speech signals, however most of these algorithms are either not robust in real-life noise environments or are not suitable for real-time implementation. The algorithm proposed in this thesis achieves formant extraction from continuous speech by using a time-varying adaptive filterbank to track and estimate individual formant frequencies. The formant tracker incorporates an adaptive voicing detector and a gender detector for robust formant extraction from continuous speech, for both male and female speakers in the presence of background noise. Thorough testing of the algorithm using various speech sentences has shown promising results over a wide range of SNRs for various types of background noises, such as AWGN, single and multiple competing background speakers and various other environmental sounds. / Thesis / Master of Applied Science (MASc) formant tracking continuous speech speaker variability
2	Respiratory and Laryngeal Function During Spontaneous Speaking in Teachers with Voice Disorders Lowell, Soren January 2005 (has links) Purpose: The purpose of this study was to determine if respiratory and laryngeal function during spontaneous speech production were different for teachers with voice disorders as compared to teachers without voice problems. The basic research questions posed in this study, as assessed during spontaneous speaking were: 1) Do subjects with a voice disorder show differences in lung volume patterns relative to control subjects? 2) Do subjects with a voice disorder show differences in vocal fold approximation as measured by contact quotient and contact index relative to control subjects? 3) Are these between-group differences most pronounced for mock teaching tasks versus a conversational speaking task? 4) Do subjects with a voice disorder rely more on laryngeal versus respiratory-based strategies for increasing loudness level as compared to control subjects?Method: Nine teachers with and nine teachers without voice problems were included in this study. Respiratory function was measured with magnetometry, and laryngeal function was measured with electroglottography. Respiratory and laryngeal function were measured during three spontaneous speaking tasks: a simulated teaching task at a typical and increased loudness level, and a conversational speaking task. Two structured speaking tasks were included for comparison of electroglottography measures: a paragraph reading task and a sustained vowel.Results: Lung volume termination level in spontaneous speaking was significantly lower for the teachers with voice disorders relative to teachers without voice problems. Lung volume initiation level was lower for the teachers with versus without voice problems during teaching-related speaking tasks. Laryngeal function as assessed with electroglottography did not show between-group differences. Across tasks, the measure of contact index was lower (more negative) during the conversational speaking task as compared to the sustained vowel task, indicating greater contact phase asymmetry during vocal fold vibration.Conclusions: These findings suggest that teachers with a voice disorder use different speech breathing strategies than teachers without voice problems. Management of teachers with voice problems may need to incorporate respiratory training that alters lung volume levels during speaking. Future research is needed to determine whether altering such patterns results in improved voice parameters and self-perceived improvement in vocal symptoms. voice disorders respiratory laryngeal teachers spontaneous speech continuous speech
3	The Unsupervised Acquisition of a Lexicon from Continuous Speech Marcken, Carl de 18 January 1996 (has links) We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency. AI MIT Artificial Intelligence induction unsupervised learning language acquisition lexical acquisition continuous speech
4	The Continuous Speech Recognition System Base on Hidden Markov Models with One-Stage Dynamic Programming Algorithm. Hsieh, Fang-Yi 03 July 2003 (has links) Based on Hidden Markov Models (HMM) with One-Stage Dynamic Programming Algorithm, a continuous-speech and speaker-independent Mandarin digit speech recognition system was designed in this work. In order to implement this architecture to fit the performance of hardware, various parameters of speech characteristics were defined to optimize the process. Finally, the ¡§State Duration¡¨ and the ¡§Tone Transition Property Parameter¡¨ were extracted from speech temporal information to improve the recognition rate. Via using the test database, experimental results show that this new ideal of one-stage dynamic programming algorithm , with ¡§state duration¡¨ and ¡§ tone transition property parameter¡¨ , will have 18% recognition rate increase when compare to the conventional one. For speaker-independent and connect-word recognition, this system will achieve recognition rate to 74%. For speaker-independent but isolate-word recognition, it will have recognition rate higher than 96%. Recognition rate of 92% is obtained as this system is applied to the connect-word speaker-dependent recognition. Hidden Markov Models Continuous Speech Recognition One-Stage Dynamic Programming Algorithm
5	Large Vocabulary Continuous Speech Recogniton For Turkish Using Htk Comez, Murat Ali 01 January 2003 (has links) (PDF) This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable. From only one simple stem, thousands of new word forms can be generated using inflectional or derivational suffixes. In this thesis, words are parsed into their stems and endings. One ending includes the suffixes attached to the associated root. Then the search network based on bigrams is constructed. Bigrams are obtained either using stem and endings, or using only stems. The language model proposed is based on bigrams obtained using only stems. All work is done in HTK (Hidden Markov Model Toolkit) environment, except parsing and network transforming. Besides of offering a new language model for Turkish, this study involves a comprehensive work about speech recognition inspecting into concepts in the state of the art speech recognition systems. To acquire good command of these concepts and processes in speech recognition isolated word, connected word and continuous speech recognition tasks are performed. The experimental results associated with these tasks are also given.
6	Language Modeling For Turkish Continuous Speech Recognition Sahin, Serkan 01 December 2003 (has links) (PDF) This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabilities using stem and endings. Then, bigram probabilities are obtained using only the stems. Single pass recognition was performed by using bigram probabilities. As a second job, two pass recognition was performed. Firstly, previous bigram probabilities were used to create word lattices. Secondly, trigram probabilities were obtained from a larger text. Finally, one-best results were obtained by using word lattices and trigram probabilities. All work is done in Hidden Markov Model Toolkit (HTK) environment, except parsing and network transforming. TK Electronics 7800-8360
7	Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus Susman, Derya 01 March 2012 (has links) (PDF) Speech recognition in Turkish Language is a challenging problem in several perspectives. Most of the challenges are related to the morphological structure of the language. Since Turkish is an agglutinative language, it is possible to generate many words from a single stem by using suffixes. This characteristic of the language increases the out-of-vocabulary (OOV) words, which degrade the performance of a speech recognizer dramatically. Also, Turkish language allows words to be ordered in a free manner, which makes it difficult to generate robust language models. In this thesis, the existing models and approaches which address the problem of Turkish LVCSR (Large Vocabulary Continuous Speech Recognition) are explored. Different recognition units (words, morphs, stem and endings) are used in generating the n-gram language models. 3-gram and 4-gram language models are generated with respect to the recognition unit. Since the solution domain of speech recognition is involved with machine learning, the performance of the recognizer depends on the sufficiency of the audio data used in acoustic model training. However, it is difficult to obtain rich audio corpora for the Turkish language. In this thesis, existing approaches are used to solve the problem of Turkish LVCSR by using a limited audio corpus. We also proposed several data selection approaches in order to improve the robustness of the acoustic model. QA Computer Software 76.75-76.765
8	A Study On Language Modeling For Turkish Large Vocabulary Continuous Speech Recognition Bayer, Ali Orkan 01 June 2005 (has links) (PDF) This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based models, and stem-end-based models. Two pass recognition is performed using the Hidden Markov Modeling Toolkit (HTK) for testing the system first with the bigram models and then with the trigram models. At the end of the study, it is found that trigram models over stems and endings give better results, since their coverage of the vocabulary is better.
9	The effect of age and different speech tasks on the Acoustic Voice Quality Index Rehn, Rosanna January 2023 (has links) Background Previous research has emphasized the importance of objectivity in voice quality evaluation. Acoustic voice quality index (AVQI) is a multiparametric objective index, that quantifies overall voice quality. Over the past decade, international studies have demonstrated strong diagnostic accuracy and sensitivity of AVQI to voice disorders. Ithas yet been inconclusive whether AVQI is independent of factors such as age and gender or if AVQI is affected by different types of continuous speech segments. Aim The aim of this study is to gather descriptive data regarding AVQI’s performance in a healthy Swedish-speaking population. Another objective is to investigate the potential impact of varying characteristics, such as age and gender, and type of continuous speech samples on the AVQI values. Method The present study gathered speech samples from 137 participants aged 20 to 90 years with a balanced gender distribution. These samples contained two different types of continuous speech, from which separate AVQI valuesin the acoustic analysis were computed. An analysis of covariance (ANCOVA) was then used to study the effects of age, gender, and type of continuous speech used on the resulting AVQI values. Results Descriptive normative data was gathered for the overall voice quality of the age groups included in this study. A statistically significant main effect of age on the AVQI values was observed. Statistical analysis revealed no significant effect of speech type, speaker gender or the interaction of age and gender on the AVQI values. Conclusions In conclusion, the present study offered data for AVQI values in the Swedish-speaking population. AVQI scores were higher in older participants compared to the younger participants. No other significant effects were found in this study. AVQI values obtained in this study and comparisons carried out with international AVQI values indicatepotentially successful use of the acoustic voice quality index in the Swedish-speaking population with some precautions. AVQI Medical and Health Sciences Medicin och hälsovetenskap
10	Automatic Speech Recognition System Continually Improving Based on Subtitled Speech Data / Automatic Speech Recognition System Continually Improving Based on Subtitled Speech Data Kocour, Martin January 2019 (has links) V dnešnej dobe systémy rozpoznávania reči s veľkým slovníkom dosahujú pomerne vysoké presnosti. Za ich výsledkami však často stoja desiatky ba až stovky hodín manuálne oanotovaných trénovacích dát. Takéto dáta sú často bežne nedostupné alebo pre požadovaný jazyk vôbec neexistujú. Možným riešením je použitie bežne dostupných no menej kvalitných audiovizuálnych dát. Táto práca sa zaoberá technikou zpracovania práve takýchto dát a ich použitím pre trénovanie akustických modelov. Ďalej táto práca pojednáva o možnom využití týchto dát pre kontinuálne vylepšovanie modelov, kedže tieto dáta sú prakticky nevyčerpateľné. Pre tieto účely bol v rámci práce navrhnutý nový prístup pre výber dát.

Search results