Global ETD Search

Return to search

Language Modeling For Turkish Continuous Speech Recognition

This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabilities using stem and endings. Then, bigram probabilities are obtained using only the stems. Single pass recognition was performed by using bigram probabilities. As a second job, two pass recognition was performed. Firstly, previous bigram probabilities were used to create word lattices. Secondly, trigram probabilities were obtained from a larger text. Finally, one-best results were obtained by using word lattices and trigram probabilities. All work is done in Hidden Markov Model Toolkit (HTK) environment, except parsing and network transforming.

http://etd.lib.metu.edu.tr/upload/2/1223254/index.pdf

TK Electronics 7800-8360

Identifer	oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/2/1223254/index.pdf
Date	01 December 2003
Creators	Sahin, Serkan
Contributors	Ciloglu, Tolga
Publisher	METU
Source Sets	Middle East Technical Univ.
Language	English
Detected Language	English
Type	M.S. Thesis
Format	text/pdf
Rights	To liberate the content for public access

Page generated in 0.0032 seconds

Language Modeling For Turkish Continuous Speech Recognition

Description

Links & Downloads

Tags

Additional Fields