Return to search

Voice query-by-example for resource-limited languages using an ergodic hidden Markov model of speech

An ergodic hidden Markov model (EHMM) can be useful in extracting underlying structure embedded in connected speech without the need for a time-aligned transcribed corpus. In this research, we present a query-by-example (QbE) spoken term detection system based on an ergodic hidden Markov model of speech.
An EHMM-based representation of speech is not invariant to speaker-dependent variations due to the unsupervised nature of the training. Consequently, a single phoneme may be mapped to a number of EHMM states. The effects of speaker-dependent and context-induced variation in speech on its EHMM-based representation have been studied and used to devise schemes to minimize these variations.
Speaker-invariance can be introduced into the system by identifying states with similar perceptual characteristics. In this research, two unsupervised clustering schemes have been proposed to identify perceptually similar states in an EHMM.
A search framework, consisting of a graphical keyword modeling scheme and a modified Viterbi algorithm, has also been implemented. An EHMM-based QbE system has been compared to the state-of-the-art and has been demonstrated to have higher precisions than those based on static clustering schemes.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/50363
Date13 January 2014
CreatorsAli, Asif
ContributorsClements, Mark A.
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeDissertation
Formatapplication/pdf

Page generated in 0.0019 seconds