Return to search

An acoustic model for speech recognition with an articulatory layer and non-linear articulatory-to-acoustic mapping

This thesis presents an extended hidden Markov Model (HMM), namely the linear/non-linear multi-level segmental hidden Markov model (linear/non-linear MSHMM). In the MSHMM framework, the relationship between symbolic and acoustic representations of a speech signal is regulated by an intermediate, articulatory-based layer. Such an approach has many potential advantages for speech pattern processing. By modelling speech dynamics directly in an articulatory domain, it may be possible to characterise the articulatory phenomena which give rise to variability in speech. The intermediate representations are based on the first three formant frequencies. The speech dynamics in the formant representation of each segment are modelled as fixed linear trajectories which characterise the distribution of formant frequencies. These trajectories are mapped into the acoustic features space by set of one or more non-linear mappings. Hence, comes the name linear/non-linear MSHMM. This thesis describes work developing a non-linear transformation approach using a nonlinear Radial Basis Function (RBF) network for the articulatory-to-acoustic mapping. A RBF network consists of a number of hidden units and mapping weights for linear transform component of the network. Each hidden unit is associated with a 'Gaussian-like' distribution. The thesis presents the training and optimisation processes for the parameters of the RBF network. The linear/non-linear MSHMMs, which form the basis for the thesis, are incorporated into an automatic speech recognition system. Gradient descent process is used to find the optimal parameters of the linear trajectory models during Viterbi training process. The phone classification experiments are presented for monophone MSHMMs using TEVflT database. The linear/non-linear MSHMM is compared with the linear/linear MSHMM, where both the model of dynamics and the articulatory-to-acoustic mappings are linear. The comparison results show no statistically significant difference in performance between these two models.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:633225
Date January 2004
CreatorsLo, Boon Hooi
PublisherUniversity of Birmingham
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0031 seconds