Thesis (MScEng)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its
strong links with human speech production and perception. Recently, it has been proved
that incorporating information from syllable-length time-scales into automatic speech
recognition improves results in large vocabulary recognition tasks. It was also shown to
aid in various language recognition tasks and in foreign accent identification. Therefore,
the ability to automatically segment speech into syllables is an important research tool.
Where most previous studies employed knowledge-based methods, this study presents a
purely statistical method for the automatic syllabification of speech.
We introduce the concept of hierarchical hidden Markov model structures and show
how these can be used to implement a purely acoustical syllable segmenter based, on
general sonority theory, combined with some of the phonotactic constraints found in the
English language.
The accurate reporting of syllabification results is a problem in the existing literature.
We present a well-defined dynamic time warping (DTW) distance measure used for
reporting syllabification results.
We achieve a token error rate of 20.3% with a 42ms average boundary error on a
relatively large set of data. This compares well with previous knowledge-based and
statistically- based methods. / AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning
weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs
is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate
verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van
syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit
is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer.
Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig.
Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie
van spraak.
Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe
dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die
model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese
beperkinge teenwoordig in die Engelse taal.
Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande
literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie
waarmee ons ons syllabifikasie resultate weergee.
Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens
fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en
statisties-gebaseerde metodes.
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/50285 |
Date | 03 1900 |
Creators | Nel, Pieter Willem |
Contributors | Du Preez, J. A., Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. |
Publisher | Stellenbosch : Stellenbosch University |
Source Sets | South African National ETD Portal |
Language | en_ZA |
Detected Language | Unknown |
Type | Thesis |
Format | 76 p. : ill. |
Rights | Stellenbosch University |
Page generated in 0.0021 seconds