In human-to-human dialogue, formulaic sequences are used to minimise the effort of both speech production and perception in the conversation. In production, the speaker apparently retrieves such sequences whole from memory, without the cognitive effort required for generation from a lexicon and grammar. In perception, context determines a set of similar phrases that the listener expects to hear, and this also reduces cognitive load. This thesis describes techniques used to automatically acquire formulaic phrases from transcriptions of speech, which are then used to define variable-length units of speech and language. These are well suited for use in a template-based speech recogniser, which can easily adjust its modelling units for the examples that are found, with the aim of improving Automatic Speech Recognition (ASR) accuracy. Language modelling techniques are described, such as the Word Phrase Link Bi- gram (WPLB) language model, which combines words and phrases together, and the Hybrid Syntactic Formulaic (HSF), which clusters semantically similar phrases using syntax. The language models are then combined with speech, in both Hidden Markov Model and template-based speech recognisers. Techniques to reduce the complexity of the search space for the template-based recogniser are introduced, such as the hierarchical LDA filter. As expected, the techniques gave significant gains when the language used was highly formulaic, and were less successful on a “standard” speech database which consisted of highly artificial utterances.
|Watkins, Christopher James
|University of East Anglia
|Electronic Thesis or Dissertation
Page generated in 0.0025 seconds