Return to search

Intonation in a text-to-speech conversion system

This thesis presents the development and implementation of a set of rules to generate intonational specifications for unrestricted text. The theoretical assumptions which motivate this work are outlined, and the performance of the rules is discussed with reference to various test corpora and formal evaluation experiments. The development of our rules is seen as a cycle involving the implementation of theoretical ideas about intonation in a text-to-speech conversion system, the testing of that implementation against some relevant body of data, and the refinement of the theory on the basis of the results. The first chapter introduces the problem of intonation in text-to-speech conversion, discusses previous practical and theoretical approaches to the problem, and sets out the general approach which is followed in subsequent chapters. We restrict the scope of our rules to generating <i>acceptable neutral</i> intonation, an approximation to <i>broad focus</i> (Ladd 1980), and we present a rule-development strategy based on the idea of a <i>default specification</i>) which can be successively refined) and on the principle of making maximum use of all the information available from text. The second chapter presents a framework for deriving an intonational specification in terms of <i>accents</i> and <i>boundaries</i> from a crude syntactic representation of any text sentence. This framework involves three stages: the division of text into intonational domains of various hierarchic levels; the assignment of accents to lexical items on the basis of stress information and grammatical class; and the modification of these accents and boundaries in accordance with phonological principles of prominence and rhythm. Chapter 3 discusses the problem of evaluating synthetic intonation, introduces an original evaluation procedure, and presents two formal evaluations of the output of the rules described in Chapter 2. Further sections present our attempts to improve our treatment of the three major causes of errors in the evaluated output: prepositional phrases, non-words or <i>anomalies</i> (e.g. numbers, dates and abbreviations), and anaphora of various kinds. The final chapter presents a summary of the main points of Chapters 1-3. We draw various conclusions regarding the nature of intonation, the development of text-to-speech conversion systems, and the generation of intonation in such systems.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:657927
Date January 1991
CreatorsMonaghan, Alexander Ian Campbell
PublisherUniversity of Edinburgh
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://hdl.handle.net/1842/20023

Page generated in 0.0015 seconds