Global ETD Search

171	Voice recognition system based on intra-modal fusion and accent classification Mangayyagari, Srikanth 01 June 2007 (has links) Speaker or voice recognition is the task of automatically recognizing people from their speech signals. This technique makes it possible to use uttered speech to verify the speaker's identity and control access to secured services. Surveillance, counter-terrorism and homeland security department can collect voice data from telephone conversation without having to access to any other biometric dataset. In this type of scenario it would be beneficial if the confidence level of authentication is high. Other applicable areas include online transactions,database access services, information services, security control for confidential information areas, and remote access to computers. Speaker recognition systems, even though they have been around for four decades, have not been widely considered as standalone systems for biometric security because of their unacceptably low performance, i.e., high false acceptance and true rejection. This thesis focuses on the enhancement of speaker recognition through a combination of intra-modal fusion and accent modeling. Initial enhancement of speaker recognition was achieved through intra-modal hybrid fusion (HF) of likelihood scores generated by Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques. Due to the Contrastive nature of AHS and HMM, we have observed a significant performance improvement of 22% , 6% and 23% true acceptance rate (TAR) at 5% false acceptance rate (FAR), when this fusion technique was evaluated on three different datasets -- YOHO, USF multi-modal biometric and Speech Accent Archive (SAA), respectively. Performance enhancement has been achieved on both the datasets; however performance on YOHO was comparatively higher than that on USF dataset, owing to the fact that USF dataset is a noisy outdoor dataset whereas YOHO is an indoor dataset. In order to further increase the speaker recognition rate at lower FARs, we combined accent information from an accent classification (AC) system with our earlier HF system. Also, in homeland security applications, speaker accent will play a critical role in the evaluation of biometric systems since users will be international in nature. So incorporating accent information into the speaker recognition/verification system is a key component that our study focused on. The proposed system achieved further performance improvements of 17% and 15% TAR at an FAR of 3% when evaluated on SAA and USF multi-modal biometric datasets. The accent incorporation method and the hybrid fusion techniques discussed in this work can also be applied to any other speaker recognition systems. Speaker recognition Accent modeling Speech processing Hidden Markov model Gaussian mixture model American Studies Arts and Humanities
172	Intelligibility of clear speech at normal rates for older adults with hearing loss Shaw, Billie Jo 01 June 2006 (has links) Clear speech refers to a speaking style that is more intelligible than typical, conversational speaking styles. It is usually produced at a slower rate compared to conversational speech. Clear speech has been shown to be more intelligible than conversational speech for a large variety of populations, including both hearing impaired (Schum, 1996; Picheny, Durlach, & Braida, 1985; and Payton, Uchanski, & Braida, 1994) and normal hearing individuals (e.g. Uchanski, Choi, Braida, Reed, & Durlach, 1996) under a variety of conditions, including those in which presentation level, speaker, and environment are varied. Although clear speech is typically slower than normally produced conversational speech, recent studies have shown that it can be produced at normal rates with training (Krause & Braida, 2002). If clear speech at normal rates is shown to be as effective for individuals with hearing loss as clear speech at slow rates, it would have both clinical and research implications. The purpose of this study was to determine the effectiveness of clear speech at normal rates for older individuals with hearing loss. It examined the way in which intelligibility, measured as percent correct keyword scores on nonsense sentences, varied as a result of speaking mode (clear versus conversational speech) and speaking rate (slow versus normal) in six adults aged 55-75 years old with moderate, sloping, hearing loss. Each listener was presented with nonsense sentences in four speech conditions: clear speech at slow rates (clear/slow), clear speech at normal rates (clear/normal), conversational speech at slow rates (conv/slow), and conversational speech at normal rates (conv/normal) read by four different talkers. Sentences were presented monaurally in quiet to the listeners via headphones. Results indicated that clear/slow speech was the most intelligible condition overall. Neither conv/slow nor clear/normal provided an intelligibility benefit relative to conv/normal speech on average, suggesting that for older adults with moderate, sloping hearing loss, the combination of using clear speech and a slower speaking rate is more beneficial to intelligibility than the additive effects of altering either speaking rate or speaking mode alone. It has been suggested previously (Krause, 2001) that audiological characteristics may contribute to the lack of clear/normal benefit for certain listeners with hearing loss. Although clear/normal speech was not beneficial on average to listeners in this study, there were cases in which the clear/normal speech of a particular talker provided a benefit to a particular listener. Thus, severity and configuration of hearing loss alone cannot fully explain the degree to which listeners from hearing loss do (or do not) benefit from clear/normal speech. More studies are needed to investigate the benefits of clear/normal speech for different audiological configurations, including individuals with flat losses. In addition, the listening tasks should include more difficult conditions in order to compensate for potential ceiling effects. Speech intelligibility Speaking rates Aging Communication breakdown Speech processing Hearing aids American Studies Arts and Humanities
173	The adoption of interactive voice response for feeding scheme programme monitoring. Qwabe, Olwethu. January 2014 (has links) M. Tech. Business Information Systems / The Department of Education should be contributing to the South African government's objective to provide a better life for all. However, the provision of education to all is hampered by the fact that a significant majority of the South African population is plagued by high levels of poverty resulting in learners attending school without having had a nutritious meal. Consequently, the provision of food in South African schools, as a lead project of the Reconstruction and Development Programme, referred to as the 'feeding scheme', was introduced. This project aimed to improve both health and education by fighting malnutrition and improving the ability of learners to concentrate during lessons. The South African government provides the funds for the school feeding programme for learners from primary to secondary schools and the Department of Education spends a large amount of money on this programme nationally. However, there is no precise data showing how successful the feeding programme is. In order for the Department of Education to meet its objectives, it is recommended that an efficient system be developed for keeping records of all the reports. It is thus critical to explore the potential use of technologies, such as interactive voice response systems. The interactive voice response solutions have the potential to assist the Department of Education in monitoring and evaluating the school feeding programme in timely, accurate and reliable ways. This research aims to evaluate how this interactive voice response system can be implemented to effectively enhance the monitoring of the feeding programme in South African schools. Speech processing systems. Automatic speech recognition.
174	Extracting Spatiotemporal Word and Semantic Representations from Multiscale Neurophysiological Recordings in Humans Chan, Alexander Mark 21 June 2014 (has links) With the recent advent of neuroimaging techniques, the majority of the research studying the neural basis of language processing has focused on the localization of various lexical and semantic functions. Unfortunately, the limited time resolution of functional neuroimaging prevents a detailed analysis of the dynamics involved in word recognition, and the hemodynamic basis of these techniques prevents the study of the underlying neurophysiology. Compounding this problem, current techniques for the analysis of high-dimensional neural data are mainly sensitive to large effects in a small area, preventing a thorough study of the distributed processing involved for representing semantic knowledge. This thesis demonstrates the use of multivariate machine-learning techniques for the study of the neural representation of semantic and speech information in electro/magneto-physiological recordings with high temporal resolution. Support vector machines (SVMs) allow for the decoding of semantic category and word-specific information from non-invasive electroencephalography (EEG) and magnetoenecephalography (MEG) and demonstrate the consistent, but spatially and temporally distributed nature of such information. Moreover, the anteroventral temporal lobe (avTL) may be important for coordinating these distributed representations, as supported by the presence of supramodal category-specific information in intracranial recordings from the avTL as early as 150ms after auditory or visual word presentation. Finally, to study the inputs to this lexico-semantic system, recordings from a high density microelectrode array in anterior superior temporal gyrus (aSTG) are obtained, and the recorded spiking activity demonstrates the presence of single neurons that respond specifically to speech sounds. The successful decoding of word identity from this firing rate information suggests that the aSTG may be involved in the population coding of acousto-phonetic speech information that is likely on the pathway for mapping speech-sounds to meaning in the avTL. The feasibility of extracting semantic and phonological information from multichannel neural recordings using machine learning techniques provides a powerful method for studying language using large datasets and has potential implications for the development of fast and intuitive communication prostheses. / Engineering and Applied Sciences decoding language machine learning neuroscience semantics speech processing neurosciences biomedical engineering
175	Blind source separation of the audio signals in a real world Choi, Hyung Keun 08 1900 (has links) No description available. Automatic speech recognition Speech processing systems Computer sound processing Adaptive filters Adaptive signal processing
176	Processing predictors of severity of speech sound disorders Pera, Natalie January 2013 (has links) This study investigated whether or not variability in the severity of speech sound disorders is related to variability in phonological short-term memory and/or variability in the accuracy of phonological representations. The aim was to determine speech processing predictors of severity of speech sound disorders. A total of 33 children, aged three to six years of age, were assessed on measures of nonword repetition, accuracy of phonological representations, accuracy of speech production, and language. The tests administered included the Clinical Evaluation of Language Fundamentals Preschool – 2 Australian, the Diagnostic Evaluation of Articulation and Phonology, the Nonword Repetition Test (modified), and the Phonological Representation Judgement Task (modified). The relationships between the results of these tests were established using a correlation analysis. The relationship between accuracy of phonological representations and the percentage of consonants correct was found to be mediated by language. There was no significant relationship between nonword repetition and percentage consonants correct. These findings may have been the result of small sample size, age of the participants, or co-morbid language difficulties. These findings imply that variability in severity of speech sound disorders may be related to a variable not directly assessed in this study. This variable may be a constraint relating to the stored motor programs within children’s speech processing systems. Implications for future research are discussed. Speech Sound Disorder Nonword Repetition Test NRT Phonological Representations speech processing phonological short-term memory severity
177	The evaluation of the stability of acoustic features in affective conveyance across multiple emotional databases Sun, Rui 20 September 2013 (has links) The objective of the research presented in this thesis was to systematically investigate the computational structure for cross-database emotion recognition. The research consisted of evaluating the stability of acoustic features, particularly the glottal and Teager Energy based features, and investigating three normalization methods and two data fusion techniques. One of the challenges of cross-database training and testing is accounting for the potential variation in the types of emotions expressed as well as the recording conditions. In an attempt to alleviate the impact of these types of variations, three normalization methods on the acoustic data were studied. Motivated by the lack of large and diverse enough emotional database to train the classifier, using multiple databases to train posed another challenge: data fusion. This thesis proposed two data fusion techniques, pre-classification SDS and post-classification ROVER to study the issue. Using the glottal, TEO and TECC features, of which the stability of emotion distinguishing ability has been highlighted on multiple databases, the systematic computational structure proposed in this thesis could improve the performance of cross-database binary-emotion recognition by up to 23% for neutral vs. emotional and 10% for positive vs. negative. Emotion recognition in speech Cross-database evaluation Language and emotions Emotions Automatic speech recognition Speech processing systems
178	Kalbos emocijų požymių tyrimas / Investigation of Speech Emotion Features Žukas, Gediminas 17 June 2014 (has links) Magistro baigiamajame darbe išnagrinėtas automatinio šnekos emocijų atpažinimo uždavinys. Nors pastaruoju metu šios srities populiarumas yra smarkiai išaugęs, tačiau vis dar trūksta literatūros aprašančios konkrečių požymių ar požymių rinkinių efektyvumą kalbos emocijoms atpažinti. Ši problema suformavo magistro baigiamojo darbo tikslą – ištirti akustinių požymių taikymą šnekos emocijoms atpažinti. Darbo metu buvo atlikta požymių sistemų analizė, sukurta emocijų požymių sistemų (rinkinių) testavimo sistema, kuria atliktas požymių rinkinių tyrimas. Tyrimo metu gauti rezultatai yra labai panašūs arba šiek tiek lenkia pastaruoju metu paskelbtus emocijų atpažinimo rezultatus naudojant Berlyno emocingos kalbos duomenų bazę. Remiantis gautais eksperimentų rezultatais, buvo sudarytos požymių rinkinių formavimo rekomendacijos. Magistro baigiamasis darbas informatikos inžinerijos laipsniui. Vilniaus Gedimino technikos universitetas. Vilnius, 2014. / This Master's thesis has examined the automatic speech emotion recognition task. Recently, the popularity of this area is greatly increased, but there is still a lack of literature describing specific acoustic features (or feature sets) performance in automatic emotion recognition task. This issue formed the purpose of this work - to explore suitable acoustic feature sets for emotion recognition task. This work includes analysis of emotion feature systems and development of speech emotion characteristics testing system. Using developed system, investigation experiments of speech emotion parameters were accomplished. The study results are very similar, or slightly ahead to recently published results of emotion recognition using the Berlin emotional speech database. According to the results of the experiments, recommendations for creating effective speech emotion feature sets were concluded. Master's degree thesis in informatics engineering. Vilnius Gediminas Technical University. Vilnius, 2014. Informatics Engineering Emocijų atpažinimas Dirbtinis intelektas Kalbos apdorojimas Emotion recognition Artificial intelligence Speech processing
179	Speech Endpoint Detection: An Image Segmentation Approach Faris, Nesma January 2013 (has links) Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms. Speech Endpoint Detection Speech Segmentation Image Segmentation Speech processing Electrical and Computer Engineering
180	Source-channel coding for CELP speech coders / J.A. Asenstorfer. Asenstorfer, John A. (John Anthony) January 1994 (has links) Bibliography: leaves 197-205. / xiv, 205 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / This thesis is concerned with methods for protecting speech coding parameters transmitted over noisy channels. A linear prediction (LP) coder is employed to remove the short term correlations of speech. Protection of two sets of parameters are investigated. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 1995? Coding theory. Random noise theory. Speech processing systems.

Search results