Return to search

Language Modeling For Turkish Continuous Speech Recognition

This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabilities using stem and endings. Then, bigram probabilities are obtained using only the stems. Single pass recognition was performed by using bigram probabilities. As a second job, two pass recognition was performed. Firstly, previous bigram probabilities were used to create word lattices. Secondly, trigram probabilities were obtained from a larger text. Finally, one-best results were obtained by using word lattices and trigram probabilities. All work is done in Hidden Markov Model Toolkit (HTK) environment, except parsing and network transforming.

Identiferoai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/2/1223254/index.pdf
Date01 December 2003
CreatorsSahin, Serkan
ContributorsCiloglu, Tolga
PublisherMETU
Source SetsMiddle East Technical Univ.
LanguageEnglish
Detected LanguageEnglish
TypeM.S. Thesis
Formattext/pdf
RightsTo liberate the content for public access

Page generated in 0.0019 seconds