Global ETD Search

1	Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR Mengusoglu, Erhan 24 May 2004 (has links) Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management. We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique. Speech recognition speaker recognition Turkish speech recognition confidence measure speaker adaptation Turkish speech database
2	A Design of Turkish Speech Recognition System Chen, Guan-lun 22 August 2011 (has links) The Republic of Turkey, founded in 1923, is a well-known ancient country with abundant cultural heritage and great junction location of the Asian and European Continents. Istanbul is the largest city of this country with her old name Constantinople or Byzantium. She was established by Constantinus I Magnus in A.D. 330 during the era of the Roman Empire, to serve as a well-fortified castle like Rome. Numerous attractions on historical architecture, ancient music, gourmet cuisine, and art collections can be explored and appreciated. It is our objective to build a language system that can help us to learn Turkish, to savor the beauty of her culture, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for a Turkish speech recognition system. It utilizes the speech features of the 395 common Turkish mono-syllables as the major training and recognition methodology. A training database of 12 utterances per mono-syllable is established by applying Turkish pronunciation rules. These 12 utterances are collected through reading 6 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.8 GHz Athlon X2 2400 personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.29% can be reached using phonotactical rules for a 3,644 vocabulary Turkish phrase database. The average computation time for the each system is less than 1.5 seconds, and the training time for the systems is about two hours. Turkish speech recognition system Hidden Markov model Linear predicted cepstral coefficients Mel-frequency cepstral coefficients phonotactics

Search results

Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR

A Design of Turkish Speech Recognition System