Return to search

Integration of multiple feature sets for reducing ambiguity in automatic speech recognition

This thesis presents a method to investigate the extent to which articulatory based acoustic features can be exploited to reduce ambiguity in automatic speech recognition search. The method proposed is based on a lattice re-scoring paradigm implemented to integrate articulatory based features into automatic speech recognition systems. Time delay neural networks are trained as feature detectors to generate feature streams over which hidden Markov models (HMMs) are defined. These articulatory based HMMs are combined with HMMs defined over spectral energy based Mel frequency cepstrum coefficient (MFCC) acoustic features through a sequential lattice re-scoring procedure. The optimum phone strings are found by maximizing the log-linear combination of acoustic and language models likelihoods during recognition. The associated log-linear weights are estimated using a discriminative model combination approach. All the experiments are performed using the DARPA TIMIT speech database and the results are presented in terms of phone accuracies.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.112579
Date January 2008
CreatorsMomayyezSiahkal, Parya.
PublisherMcGill University
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Formatapplication/pdf
CoverageMaster of Engineering (Department of Electrical and Computer Engineering.)
RightsAll items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.
Relationalephsysno: 002713669, proquestno: AAIMR51470, Theses scanned by UMI/ProQuest.

Page generated in 0.0021 seconds