Global ETD Search

Return to search

Fast accurate diphone-based phoneme recognition

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009. / Statistical speech recognition systems typically utilise a set of statistical models of subword
units based on the set of phonemes in a target language. However, in continuous
speech it is important to consider co-articulation e ects and the interactions between
neighbouring sounds, as over-generalisation of the phonetic models can negatively a ect
system accuracy. Traditionally co-articulation in continuous speech is handled by incorporating
contextual information into the subword model by means of context-dependent
models, which exponentially increase the number of subword models. In contrast, transitional
models aim to handle co-articulation by modelling the interphone dynamics found
in the transitions between phonemes.
This research aimed to perform an objective analysis of diphones as subword units for
use in hidden Markov model-based continuous-speech recognition systems, with special
emphasis on a direct comparison to a context-dependent biphone-based system in terms
of complexity, accuracy and computational e ciency in similar parametric conditions. To
simulate practical conditions, the experiments were designed to evaluate these systems
in a low resource environment { limited supply of training data, computing power and
system memory { while still attempting fast, accurate phoneme recognition.
Adaptation techniques designed to exploit characteristics inherent in diphones, as
well as techniques used for e ective parameter estimation and state-level tying were used
to reduce resource requirements while simultaneously increasing parameter reliability.
These techniques include diphthong splitting, utilisation of a basic diphone grammar,
diphone set completion, maximum a posteriori estimation and decision-tree based state
clustering algorithms. The experiments were designed to evaluate the contribution of each
adaptation technique individually and subsequently compare the optimised diphone-based
recognition system to a biphone-based recognition system that received similar treatment.
Results showed that diphone-based recognition systems perform better than both traditional
phoneme-based systems and context-dependent biphone-based systems when evaluated
in similar parametric conditions. Therefore, diphones are e ective subword units,
which carry suprasegmental knowledge of speech signals and provide an excellent compromise
between detailed co-articulation modelling and acceptable system performance

http://hdl.handle.net/10019.1/1779

Phoneme recognition

Diphones

Acoustic speech modelling

Automatic speech recognition

Electrical and Electronic Engineering

Identifer	oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/1779
Date	03 1900
Creators	Du Preez, Marianne
Contributors	Du Preez, J. A., Engelbrecht, H. A., University of Stellenbosch. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.
Publisher	Stellenbosch : University of Stellenbosch
Source Sets	South African National ETD Portal
Language	English
Detected Language	English
Type	Thesis
Rights	University of Stellenbosch

Page generated in 0.0115 seconds

Fast accurate diphone-based phoneme recognition

Description

Links & Downloads

Tags

Additional Fields