Return to search

Improving continuous speech recognition with automatic multiple pronunciation support

Conventional computer speech recognition systems use models of speech acoustics and the language of the recognition task in order to perform recognition. For all but trivial recognition tasks, sub-word units are modeled, typically phonemes. Recognizing words then requires a pronunciation dictionary ( PD) to specify how each word is pronounced in terms of the units modeled. Even if the acoustic modeling component is perfect, the recognizer will still be prone to misrecognition, most often because the speaker can use a pronunciation other than that in the PD. This different pronunciation may be due to the speaker being a non-native speaker of the language being recognized, having 'mispronounced' the word, coarticulatory effects, recognizer errors in phoneme hypothesization, or any combination of these. One way to overcome these misrecognitions is to use a dynamic PD, able to acquire new pronunciations for words as they are encountered and misrecognized. The thesis examines the following questions: can automated methods be found that produce reliable alternate pronunciations? If so, does augmenting a PD (which originally contains only canonical pronunciations) with these alternate pronunciations lead to improved recognizer performance? It shows that using even simple methods, average reductions in word error rate of at least 45% are possible, even with speakers who are not native speakers of the recognition task language.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.35621
Date January 1997
CreatorsSnow, Charles.
ContributorsDe Mori, Renato (advisor)
PublisherMcGill University
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Formatapplication/pdf
CoverageDoctor of Philosophy (School of Computer Science.)
RightsAll items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.
Relationalephsysno: 001641762, proquestno: NQ44592, Theses scanned by UMI/ProQuest.

Page generated in 0.0022 seconds