Return to search

Speech generation in a spoken dialogue system

Thesis (MScIng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: Spoken dialogue systems accessed over the telephone network are rapidly becoming more
popular as a means to reduce call-centre costs and improve customer experience. It is
now technologically feasible to delegate repetitive and relatively simple tasks conducted
in most telephone calls to automatic systems. Such a system uses speech recognition to
take input from users. This work focuses on the speech generation component that a
specific prototype system uses to convey audible speech output back to the user.
Many commercial systems contain general text-to-speech synthesisers. Text-to-speech
synthesis is a very active branch of speech processing. It aims to build machines that
read text aloud. In some languages this has been a reality for almost two decades. While
these synthesisers are often very understandable, they almost never sound natural. The
output quality of synthetic speech is considered to be a very important factor in the user’s
perception of the quality and usability of spoken dialogue systems.
The static nature of the spoken dialogue system is exploited to produce a custom
speech synthesis component that provides very high quality output speech for the particular
application. To this end the current state of the art in speech synthesis is surveyed
and summarised. A unit-selection synthesiser is produced that functions in Afrikaans,
English and Xhosa.
The unit-selection synthesiser selects short waveforms from a recorded speech corpus,
and concatenates them to produce the required utterances. Techniques are developed for
designing a compact corpus and processing it to produce a unit-selection database. Speech
modification methods were researched to build a framework for natural-sounding speech
concatenation. This framework also provides pitch and duration modification capabilities
that will enable research in languages such as Afrikaans and Xhosa where text-to-speech
capabilities are relatively immature. / AFRIKAANSE OPSOMMING: Telefoniese, spraakgebaseerde dialoogstelsels word steeds meer algemeen, en is ’n doeltreffende
metode om oproepsentrumkostes te verlaag. Dit is tans tegnologies moontlik om
’n groot aantal eenvoudige transaksies met automatiese stelsels te hanteer. Sulke stelsels
gebruik spraakherkenning om intree van die gebruiker te ontvang. Hierdie werk fokus op
die spraakgenerasiekomponent wat ’n spesifieke prototipestelsel gebruik om afvoer aan
die gebruiker terug te speel.
Vele kommersi¨ele stelsels gebruik generiese teks-na-spraak sintetiseerders. Sulke teksna-
spraak sintetiseerders is steeds ’n baie aktiewe veld in spraaknavorsing. In die algemeen
poog navorsing om teks te kan lees en om te sit in verstaanbare spraak. Sulke stelsels
bestaan nou al vir ten minste twee dekades. Alhoewel heeltemal verstaanbaar, klink
hierdie stelsels onnatuurlik. In telefoniese spraakgebaseerde dialoogstelsels is kwaliteit
van die sintetiese spraak belangrik vir die gebruiker se persepsie van die stelsel se kwaliteit
en bruikbaarheid.
Die dialoog is meestal staties van aard en hierdie eienskap word benut om ho¨e kwaliteit
spraak in ’n bepaalde toepassing te sintetiseer. Om dit reg te kry is die huidige stand van
sake in hierdie veld bestudeer en opgesom. ’n Knip-en-plak sintetiseerder is gebou wat
werk in Afrikaans, Engels en Xhosa.
Die sintetiseerder selekteer kort stukkies spraakgolfvorms vanuit ’n spraakkorpus, en
las dit aanmekaar om die vereiste spraak te produseer. Outomatiese tegnieke is ontwikkel
om ’n kompakte korpus te ontwerp wat steeds alles bevat wat die sintetiseerder sal nodig
hˆe om sy taak te verrig. Verdere tegnieke prosesseer die korpus tot ’n bruikbare vorm vir
sintese.
Metodes van spraakmodifikasie is ondersoek ten einde die aanmekaargelaste stukkies
spraak meer natuurlik te laat klink en die intonasie en tempo daarvan te korrigeer. Dit
verskaf infrastruktuur vir navorsing in tale soos Afrikaans en Xhosa waar teks-na-spraak
vermo¨ens nog onvolwasse is.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/16460
Date12 1900
CreatorsVisagie, Albertus Sybrand
ContributorsDu Preez, J. A., University of Stellenbosch. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.
PublisherStellenbosch : University of Stellenbosch
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageUnknown
TypeThesis
Formatxv, 144 leaves : ill.
RightsUniversity of Stellenbosch

Page generated in 0.0024 seconds