Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR

Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system.
Speech/speaker recognition systems are usually based on statistical modeling techniques. In
this thesis we defined confidence measures for statistical modeling techniques used in speech/speaker recognition systems.
For speech recognition we tested available confidence measures and the newly defined acoustic prior information based confidence measure in two different conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly defined confidence measure performs better in both tests.
Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis.
We defined also a new interpretation technique for confidence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verification. Transformation provided us with a linearly interpretable confidence level which can be used directly in real time applications like for dialog management.
We have also tested the confidence measures for speaker verification systems and evaluated the efficiency of the confidence measures for adaptation of speaker models. We showed that use of confidence measures to select adaptation data improves the accuracy of the speaker model adaptation process.
Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for
other languages.
A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly defined language modeling techniques showed that it outperforms the classical n-gram language modeling technique.
Date24 May 2004
CreatorsMengusoglu, Erhan
ContributorsTrecat, J., Leich, H., Hanton, J., Gosselin, B., Froidure, J-C., Macq, B., Grenez, F.
PublisherFaculte Polytechnique de Mons
Source SetsBibliothèque interuniversitaire de la Communauté française de Belgique
Detected LanguageEnglish
unrestricted

