This thesis presents a method to investigate the extent to which articulatory based acoustic features can be exploited to reduce ambiguity in automatic speech recognition search. The method proposed is based on a lattice re-scoring paradigm implemented to integrate articulatory based features into automatic speech recognition systems. Time delay neural networks are trained as feature detectors to generate feature streams over which hidden Markov models (HMMs) are defined. These articulatory based HMMs are combined with HMMs defined over spectral energy based Mel frequency cepstrum coefficient (MFCC) acoustic features through a sequential lattice re-scoring procedure. The optimum phone strings are found by maximizing the log-linear combination of acoustic and language models likelihoods during recognition. The associated log-linear weights are estimated using a discriminative model combination approach. All the experiments are performed using the DARPA TIMIT speech database and the results are presented in terms of phone accuracies.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.112579 |
Date | January 2008 |
Creators | MomayyezSiahkal, Parya. |
Publisher | McGill University |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | application/pdf |
Coverage | Master of Engineering (Department of Electrical and Computer Engineering.) |
Rights | All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated. |
Relation | alephsysno: 002713669, proquestno: AAIMR51470, Theses scanned by UMI/ProQuest. |
Page generated in 0.0017 seconds