Global ETD Search

Return to search

Voice query-by-example for resource-limited languages using an ergodic hidden Markov model of speech

An ergodic hidden Markov model (EHMM) can be useful in extracting underlying structure embedded in connected speech without the need for a time-aligned transcribed corpus. In this research, we present a query-by-example (QbE) spoken term detection system based on an ergodic hidden Markov model of speech.
An EHMM-based representation of speech is not invariant to speaker-dependent variations due to the unsupervised nature of the training. Consequently, a single phoneme may be mapped to a number of EHMM states. The effects of speaker-dependent and context-induced variation in speech on its EHMM-based representation have been studied and used to devise schemes to minimize these variations.
Speaker-invariance can be introduced into the system by identifying states with similar perceptual characteristics. In this research, two unsupervised clustering schemes have been proposed to identify perceptually similar states in an EHMM.
A search framework, consisting of a graphical keyword modeling scheme and a modified Viterbi algorithm, has also been implemented. An EHMM-based QbE system has been compared to the state-of-the-art and has been demonstrated to have higher precisions than those based on static clustering schemes.

http://hdl.handle.net/1853/50363

Automatic speech recognition

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/50363
Date	13 January 2014
Creators	Ali, Asif
Contributors	Clements, Mark A.
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Dissertation
Format	application/pdf

Page generated in 0.0019 seconds

Voice query-by-example for resource-limited languages using an ergodic hidden Markov model of speech

Description

Links & Downloads

Tags

Additional Fields