Thesis (MEng)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: We develop and evaluate several algorithms that segment a speech signal into subword units without
using phone or orthographic transcripts. These segmentation algorithms rely on a scoring function,
termed the local score, that is applied at the feature level and indicates where the characteristics of the
audio signal change. The predominant approach in the literature to segmentation is to apply a threshold
to the local score, and local maxima (peaks) that are above the threshold result in the hypothesis of a
segment boundary. Scoring mechanisms of a select number of such algorithms are investigated, and it
is found that these local scores frequently exhibit clusters of peaks near phoneme transitions that cause
spurious segment boundaries. As a consequence, very short segments are sometimes postulated by the
algorithms. To counteract this, ad-hoc remedies are proposed in the literature. We propose a dynamic
programming (DP) framework for speech segmentation that employs a probabilistic segment length
model in conjunction with the local scores. DP o ers an elegant way to deal with peak clusters by
choosing only the most probable segment length and local score combinations as boundary positions.
It is shown to o er a clear performance improvement over selected methods from the literature serving
as benchmarks.
Multilayer perceptrons (MLPs) can be trained to generate local scores by using groups of feature
vectors centred around phoneme boundaries and midway between phoneme boundaries in suitable
training data. The MLPs are trained to produce a high output value at a boundary, and a low value
at continuity. It was found that the more accurate local scores generated by the MLP, which rarely
exhibit clusters of peaks, made the additional application of DP less e ective than before. However, a
hybrid approach in which DP is used only to resolve smaller, more ambiguous peaks in the local score
was found to o er a substantial improvement on all prior methods.
Finally, restricted Boltzmann machines (RBMs) were applied as features detectors. This provided a
means of building multi-layer networks that are capable of detecting highly abstract features. It is
found that when local score are estimated by such deep networks, additional performance gains are
achieved. / AFRIKAANSE OPSOMMING: Ons ontwikkel en evalueer verskeie algoritmes wat 'n spraaksein in sub-woord eenhede segmenteer
sonder om gebruik te maak van ortogra ese of fonetiese transkripsies. Dié algoritmes maak gebruik van
'n funksie, genaamd die lokale tellingsfunksie, wat 'n waarde produseer omtrent die lokale verandering in
'n spraaksein. In die literatuur is daar gevind dat die hoofbenadering tot segmentasie gebaseer is op 'n
grenswaarde, waarbo alle lokale maksima (pieke) in die lokale telling lei tot 'n skeiding tussen segmente.
'n Selektiewe groep segmentasie algoritmes is ondersoek en dit is gevind dat lokale tellings geneig is
om groeperings van pieke te hê naby aan die skeidings tussen foneme. As gevolg hiervan, word baie
kort segmente geselekteer deur die algoritmes. Om dit teen te werk, word ad-hoc metodes voorgestel
in die literatuur. Ons stel 'n alternatiewe metode voor wat gebaseer is op dinamiese programmering
(DP), wat 'n statistiese verspreiding van lengtes van segmente inkorporeer by segmentasie. DP bied 'n
elegante manier om groeperings van pieke te hanteer, deurdat net kombinasies van hoë lokale tellings en
segmentwaarskynlikheid, met betrekking tot die lengte van die segment, tot 'n skeiding lei. Daar word
gewys dat DP 'n duidelike verbetering in segmentasie akkuraatheid toon bo 'n paar gekose algoritmes
uit die literatuur.
Meervoudige lae perseptrone (MLPe) kan opgelei word om 'n lokale telling te genereer deur gebruik te
maak van groepe eienskapsvektore gesentreerd rondom en tussen foneem skeidings in geskikte opleidingsdata.
Die MLPe word opgelei om 'n groot waarde te genereer as 'n foneem skeiding voorkom
en 'n klein waarde andersins. Dit is gevind dat die meer akkurate lokale tellings wat deur die MLPe
gegenereer word minder groeperings van pieke het, wat dan die addisionele toepassing van die DP
minder e ektief maak. 'n Hibriede toepassing, waar DP net tussen kleiner en minder duidelike pieke
in die lokale telling kies, lei egter tot 'n groot verbetering bo-op alle vorige metodes.
As 'n nale stap het ons beperkte Boltzmann masjiene (BBMe) gebruik om patrone in data te identi-
seer. Sodoende, verskaf BBMe 'n manier om meervoudige lae netwerke op te bou waar die boonste
lae baie komplekse patrone in die data identi seer. Die toepassing van dié dieper netwerke tot die
generasie van 'n lokale telling het tot verdere verbeteringe in segmentasie-akkuraatheid gelei. / National Research Foundation (NRF)
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/86725 |
Date | 03 1900 |
Creators | Van Vuuren, Van Zyl |
Contributors | Niesler, T. R., Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. |
Publisher | Stellenbosch : Stellenbosch University |
Source Sets | South African National ETD Portal |
Language | en_ZA |
Detected Language | Unknown |
Type | Thesis |
Format | 108 pages : illustrations |
Rights | Stellenbosch University |
Page generated in 0.0021 seconds