Return to search

Automatic phoneme recognition of South African English

Thesis (MEng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Automatic speech recognition applications have been developed for many languages in
other countries but not much research has been conducted on developing Human Language
Technology (HLT) for S.A. languages. Research has been performed on informally
gathered speech data but until now a speech corpus that could be used to develop HLT
for S.A. languages did not exist. With the development of the African Speech Technology
Speech Corpora, it has now become possible to develop commercial applications of HLT.
The two main objectives of this work are the accurate modelling of phonemes, suitable
for the purposes of LVCSR, and the evaluation of the untried S.A. English speech corpus.
Three different aspects of phoneme modelling was investigated by performing isolated
phoneme recognition on the NTIMIT speech corpus. The three aspects were signal
processing, statistical modelling of HMM state distributions and context-dependent
phoneme modelling. Research has shown that the use of phonetic context when modelling
phonemes forms an integral part of most modern LVCSR systems. To facilitate
the context-dependent phoneme modelling, a method of constructing robust and accurate
models using decision tree-based state clustering techniques is described. The strength
of this method is the ability to construct accurate models of contexts that did not occur
in the training data. The method incorporates linguistic knowledge about the phonetic
context, in conjunction with the training data, to decide which phoneme contexts are
similar and should share model parameters.
As LVCSR typically consists of continuous recognition of spoken words, the contextdependent
and context-independent phoneme models that were created for the isolated
recognition experiments are evaluated by performing continuous phoneme recognition.
The phoneme recognition experiments are performed, without the aid of a grammar or
language model, on the S.A. English corpus. As the S.A. English corpus is newly created,
no previous research exist to which the continuous recognition results can be compared to.
Therefore, it was necessary to create comparable baseline results, by performing continuous
phoneme recognition on the NTIMIT corpus. It was found that acceptable recognition
accuracy was obtained on both the NTIMIT and S.A. English corpora. Furthermore, the
results on S.A. English was 2 - 6% better than the results on NTIMIT, indicating that the
S.A. English corpus is of a high enough quality that it can be used for the development
of HLT. / AFRIKAANSE OPSOMMING: Automatiese spraak-herkenning is al ontwikkel vir ander tale in ander lande maar, daar
nog nie baie navorsing gedoen om menslike taal-tegnologie (HLT) te ontwikkel vir Suid-
Afrikaanse tale. Daar is al navorsing gedoen op spraak wat informeel versamel is, maar tot
nou toe was daar nie 'n spraak databasis wat vir die ontwikkeling van HLT vir S.A. tale.
Met die ontwikkeling van die African Speech Technology Speech Corpora, het dit moontlik
geword om HLT te ontwikkel vir wat geskik is vir kornmersiele doeleindes.
Die twee hoofdoele van hierdie tesis is die akkurate modellering van foneme, geskik
vir groot-woordeskat kontinue spraak-herkenning (LVCSR), asook die evaluasie van die
S.A. Engels spraak-databasis.
Drie aspekte van foneem-modellering word ondersoek deur isoleerde foneem-herkenning
te doen op die NTIMIT spraak-databasis. Die drie aspekte wat ondersoek word is
sein prosessering, statistiese modellering van die HMM toestands distribusies, en konteksafhanklike
foneem-modellering. Navorsing het getoon dat die gebruik van fonetiese konteks
'n integrale deel vorm van meeste moderne LVCSR stelsels. Dit is dus nodig om robuuste
en akkurate konteks-afhanklike modelle te kan bou. Hiervoor word 'n besluitnemingsboom-
gebaseerde trosvormings tegniek beskryf. Die tegniek is ook in staat is om akkurate
modelle te bou van kontekste van nie voorgekom het in die afrigdata nie. Om te besluit
watter fonetiese kontekste is soortgelyk en dus model parameters moet deel, maak die
tegniek gebruik van die afrigdata en inkorporeer taalkundige kennis oor die fonetiese kontekste.
Omdat LVCSR tipies is oor die kontinue herkenning van woorde, word die konteksafhanklike
en konteks-onafhanklike modelle, wat gebou is vir die isoleerde foneem-herkenningseksperimente,
evalueer d.m.v. kontinue foneem-herkening. Die kontinue foneemherkenningseksperimente
word gedoen op die S.A. Engels databasis, sonder die hulp van
'n taalmodel of grammatika. Omdat die S.A. Engels databasis nuut is, is daar nog geen
ander navorsing waarteen die result ate vergelyk kan word nie. Dit is dus nodig om kontinue
foneem-herkennings result ate op die NTIMIT databasis te genereer, waarteen die
S.A. Engels resulte vergelyk kan word. Die resulate dui op aanvaarbare foneem her kenning
op beide die NTIMIT en S.A. Engels databassise. Die resultate op S.A. Engels
is selfs 2 - 6% beter as die resultate op NTIMIT, wat daarop dui dat die S.A. Engels
spraak-databasis geskik is vir die ontwikkeling van HLT.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/49867
Date03 1900
CreatorsEngelbrecht, Herman Arnold
ContributorsDu Preez, J. A., Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageEnglish
TypeThesis
Format166 leaves : ill.
RightsStellenbosch University

Page generated in 0.0028 seconds