Global ETD Search

1	Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR Mengusoglu, Erhan 24 May 2004 (has links) Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management. We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique. Speech recognition speaker recognition Turkish speech recognition confidence measure speaker adaptation Turkish speech database
2	CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments NAKAMURA, Satoshi, TAKEDA, Kazuya, FUJIMOTO, Masakiyo 01 November 2006 (has links) No description available. CENSREC-3 in-car speech database common evaluation framework noisy speech recognition
3	The automatic recognition of emotions in speech Manamela, Phuti, John January 2020 (has links) Thesis(M.Sc.(Computer Science)) -- University of Limpopo, 2020 / Speech emotion recognition (SER) refers to a technology that enables machines to detect and recognise human emotions from spoken phrases. In the literature, numerous attempts have been made to develop systems that can recognise human emotions from their voice, however, not much work has been done in the context of South African indigenous languages. The aim of this study was to develop an SER system that can classify and recognise six basic human emotions (i.e., sadness, fear, anger, disgust, happiness, and neutral) from speech spoken in Sepedi language (one of South Africa’s official languages). One of the major challenges encountered, in this study, was the lack of a proper corpus of emotional speech. Therefore, three different Sepedi emotional speech corpora consisting of acted speech data have been developed. These include a RecordedSepedi corpus collected from recruited native speakers (9 participants), a TV broadcast corpus collected from professional Sepedi actors, and an Extended-Sepedi corpus which is a combination of Recorded-Sepedi and TV broadcast emotional speech corpora. Features were extracted from the speech corpora and a data file was constructed. This file was used to train four machine learning (ML) algorithms (i.e., SVM, KNN, MLP and Auto-WEKA) based on 10 folds validation method. Three experiments were then performed on the developed speech corpora and the performance of the algorithms was compared. The best results were achieved when Auto-WEKA was applied in all the experiments. We may have expected good results for the TV broadcast speech corpus since it was collected from professional actors, however, the results showed differently. From the findings of this study, one can conclude that there are no precise or exact techniques for the development of SER systems, it is a matter of experimenting and finding the best technique for the study at hand. The study has also highlighted the scarcity of SER resources for South African indigenous languages. The quality of the dataset plays a vital role in the performance of SER systems. / National research foundation (NRF) and Telkom Center of Excellence (CoE) Speech emotion recognition Machine learning Feature extraction Classification Emotional speech database Automatic speech recognition Machine learning
4	Modeling Phoneme Durations And Fundamental Frequency Contours In Turkish Speech Ozturk, Ozlem 01 October 2005 (has links) (PDF) The term prosody refers to characteristics of speech such as intonation, timing, loudness, and other acoustical properties imposed by physical, intentional and emotional state of the speaker. Phone durations and fundamental frequency contours are considered as two of the most prominent aspects of prosody. Modeling phone durations and fundamental frequency contours in Turkish speech are studied in this thesis. Various methods exist for building prosody models. State-of-the-art is dominated by corpus-based methods. This study introduces corpus-based approaches using classification and regression trees to discover the relationships between prosodic attributes and phone durations or fundamental frequency contours. In this context, a speech corpus, designed to have specific phonetic and prosodic content has been recorded and annotated. A set of prosodic attributes are compiled. The elements of the set are determined based on linguistic studies and literature surveys. The relevances of prosodic attributes are investigated by statistical measures such as mutual information and information gain. Fundamental frequency contour and phone duration modeling are handled as independent problems. Phone durations are predicted by using regression trees where the set of prosodic attributes is formed by forward selection. Quantization of phone durations is studied to improve prediction quality. A two-stage duration prediction process is proposed for handling specific ranges of phone duration values. Scaling and shifting of predicted durations are proposed to minimize mean squared error. Fundamental frequency contour modeling is studied under two different frameworks. One of them generates a codebook of syllable-fundamental-frequency-contours by vector quantization. The codewords are used to predict sentence fundamental frequency contours. Pitch accent prediction by two different clustering of codewords into accented and not-accented subsets is also considered in this framework. Based on the experience, the other approach is initiated. An algorithm has been developed to identify syllables having perceptual prominence or pitch accents. The slope of fundamental frequency contours are then predicted for the syllables identified as accented. Pitch contours of sentences are predicted using the duration information and estimated slope values. Performance of the phone duration and fundamental frequency contour models are evaluated quantitatively using statistical measures such as mean absolute error, root mean squared error, correlation and by kappa coefficients, and by correct classification rate in case of discrete symbol prediction.
5	Vliv alkoholu na řečový signál / Effect of alcohol on speech signal Kandus, Filip January 2011 (has links) The main theme of the thesis is to examine the influence of alcohol on the speech apparatus and speech signal. The first part is focused on symptoms and detection of alcohol concentration in the human body. The following part describes somescientific publications and projects, which dealt witha a similar theme. Also the czech documentation to german database ALC was created. Based on phonetic knowledge, Czech text was compiled. Different speakers were reading this text so we go tour own database of alcoholic and sober speech. Samples from individual speakers are processed using linear prediction, formant and cepstral analysis in MATLAB and the effect of alcohol on selected parameters of speech signal is evaluated.
6	CIAIR実走行車内音声データベース ITAKURA, Fumitada, TAKEDA, Kazuya, YAMAGUCHI, Yukiko, MATSUBARA, Shigeki, KAWAGUCHI, Nobuo, 板倉, 文忠, 武田, 一哉, 山口, 由紀子, 松原, 茂樹, 河口, 信夫 18 December 2003 (has links) 情報処理学会研究報告. SLP, 音声言語情報処理; 2003-SLP-49-24 第5回音声言語シンポジウム spoken language analysis spoken language processing robust speech recognition Speech database 話し言葉解析話し言葉処理ロバスト音声認識音声データベース
7	実走行車内における音声データベースの構築 Itakura, Fumitada, Takeda, Kazuya, Kajita, Shoji, Iwa, Hiroyuki, Matsubara, Shigeki, Kawaguchi, Nobuo, 板倉, 文忠, 武田, 一哉, 梶田, 将司, 岩, 博之, 松原, 茂樹, 河口, 信夫 04 February 2000 (has links) 情報処理学会研究報告. SLP, 音声言語情報処理; 2000-SLP-30-12 simulated dialogue spoken dialogue corpus speech database robust speech recognition moving car environment 模擬対話音声対話コーパス音声データベースロバスト音声認識実走行車環境

1

Page generated in 0.0881 seconds