Global ETD Search

Return to search

Automatic speech segmentation with limited data / by D.R. van Niekerk

The rapid development of corpus-based speech systems such as concatenative synthesis systems for
under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility,
while automation of this process has only been satisfactorily demonstrated on large corpora of a select
few languages by employing techniques requiring extensive and specialised resources.
In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done
through an empirical evaluation of existing segmentation techniques on typical speech corpora in three
South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efﬁcient application of these techniques were investigated in
order to improve the accuracy of resulting phonetic alignments.
We found that the application of baseline speaker-speciﬁc Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated
how such models can be developed and applied efﬁciently in this context. The result is segmentation
of sufﬁcient quality for synthesis applications, with the quality of alignments comparable to manual
segmentation efforts in this context. Finally, possibilities for further automated reﬁnement of phonetic alignments were investigated and an efﬁcient corpus development strategy was proposed with
suggestions for further work in this direction. / Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.

http://hdl.handle.net/10394/3978

Phonetic speech segmentation

Phonetic alignment

Speech synthesis

Text-to-speech

Speech corpus development

Resource scarce languages

Hidden Markov models

Dynamic time warping

Identifer	oai:union.ndltd.org:NWUBOLOKA1/oai:dspace.nwu.ac.za:10394/3978
Date	January 2009
Creators	Van Niekerk, Daniel Rudolph
Publisher	North-West University
Source Sets	North-West University
Detected Language	English
Type	Thesis

Page generated in 0.0026 seconds

Automatic speech segmentation with limited data / by D.R. van Niekerk

Description

Links & Downloads

Tags

Additional Fields