Return to search

Nonparametric Bayesian Approaches for Acoustic Modeling

The goal of Bayesian analysis is to reduce the uncertainty about unobserved variables by combining prior knowledge with observations. A fundamental limitation of a parametric statistical model, including a Bayesian approach, is the inability of the model to learn new structures. The goal of the learning process is to estimate the correct values for the parameters. The accuracy of these parameters improves with more data but the model’s structure remains fixed. Therefore new observations will not affect the overall complexity (e.g. number of parameters in the model). Recently, nonparametric Bayesian methods have become a popular alternative to Bayesian approaches because the model structure is learned simultaneously with the parameter distributions in a data-driven manner. The goal of this dissertation is to apply nonparametric Bayesian approaches to the acoustic modeling problem in continuous speech recognition. Three important problems are addressed: (1) statistical modeling of sub-word acoustic units; (2) semi-supervised training algorithms for nonparametric acoustic models; and (3) automatic discovery of sub-word acoustic units. We have developed a Doubly Hierarchical Dirichlet Process Hidden Markov Model (DHDPHMM) with a non-ergodic structure that can be applied to problems involving sequential modeling. DHDPHMM shares mixture components between states using two Hierarchical Dirichlet Processes (HDP). An inference algorithm for this model has been developed that enables DHDPHMM to outperform both its hidden Markov model (HMM) and HDP HMM (HDPHMM) counterparts. This inference algorithm is shown to also be computationally less expensive than a comparable algorithm for HDPHMM. In addition to sharing data, the proposed model can learn non-ergodic structures and non-emitting states, something that HDPHMM does not support. This extension to the model is used to model finite length sequences. We have also developed a generative model for semi-supervised training of DHDPHMMs. Semi-supervised learning is an important practical requirement for many machine learning applications including acoustic modeling in speech recognition. The relative improvement in error rates on classification and recognition tasks is shown to be 22% and 7% respectively. Semi-supervised training results are slightly better than supervised training (29.02% vs. 29.71%). Context modeling was also investigated and results show a modest improvement of 1.5% relative over the baseline system. We also introduce a nonparametric Bayesian transducer based on an ergodic HDPHMM/DHDPHMM that automatically segments and clusters the speech signal using an unsupervised approach. This transducer was used in several applications including speech segmentation, acoustic unit discovery, spoken term detection and automatic generation of a pronunciation lexicon. For the segmentation problem, an F¬¬¬¬¬¬-score of 76.62% was achieved which represents a 9% relative improvement over the baseline system. On the spoken term detection tasks, an average precision of 64.91% was achieved, which represents a 20% improvement over the baseline system. Lexicon generation experiments also show automatically discovered units (ADU) generalize to new datasets. In this dissertation, we have established the foundation for applications of non-parametric Bayesian modeling to problems such as speech recognition that involve sequential modeling. These models allow a new generation of machine learning systems that adapt their overall complexity in a data-driven manner and yet preserve meaningful modalities in the data. As a result, these models improve generalization and offer higher performance at lower complexity. / Electrical and Computer Engineering

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/2975
Date January 2015
CreatorsHarati Nejad Torbati, Amir Hossein
ContributorsPicone, Joseph, Picone, Joseph, Sobel, Marc J., Obeid, Iyad, 1975-, Vucetic, Slobodan, Won, Chang-Hee, 1967-, Buckley, Kevin M.
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format155 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/2957, Theses and Dissertations

Page generated in 0.0018 seconds