This thesis explores the treatment of temporal information in Automated Speech Recognition. It reviews the study of time in speech perception and concludes that while some temporal information in the speech signal is of crucial value in the speech decoding process not all temporal information is relevant to decoding. We then review the representation of temporal information in the main automated recognition techniques: Hidden Markov Models and Artificial Neural Networks. We find that both techniques have difficulty representing the type of temporal information that is phonetically or phonologically significant in the speech signal.
In an attempt to improve this situation we explore the problem of representation of temporal information in the acoustic vectors commonly used to encode the speech acoustic signal in the front-ends of speech recognition systems. We attempt, where possible, to let the signal provide the temporal structure rather than imposing a fixed, clock-based timing framework. We develop a novel acoustic temporal parameter (the Parameter Similarity Length), a measure of temporal stability, that is tested against the time derivatives of acoustic parameters conventionally used in acoustic vectors.
Identifer | oai:union.ndltd.org:ADTP/216754 |
Date | January 2003 |
Creators | Davies, David Richard Llewellyn, dave.davies@canberra.edu.au |
Publisher | The Australian National University. Research School of Information Sciences and Engineering |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://www.anu.edu.au/legal/copyrit.html), Copyright David Richard Llewellyn Davies |
Page generated in 0.0018 seconds