Return to search

Modelling asynchrony in the articulation of speech for automatic speech recognition

Current automatic speech recognition systems make the assumption that all the articulators in the vocal tract move in synchrony with one another to produce speech. This thesis describes the development of a more realistic model that allows some asynchrony between the articulators with the aim of improving speech recognition accuracy. Experiments on the TEVHT database demonstrate that higher phone recognition accuracy is obtained by separate modelling of the voiced and voiceless components of speech by splitting the speech spectrum into high and low frequency bands. To model further articulator asynchrony in speech production requires a representation of speech that is closer to the actual production process. Formant frequency parameters are integrated into typical Mel-frequency cepstral coefficient representation and their effect on recognition accuracy observed. The formant frequency estimates can only accurately be made when the formants are visible in the spectrum, so a technique is developed to ignore frequency estimates generated when the formants are not visible. The formant data allows a unique method of vocal tract normalization, which improves recognition accuracy. Finally a classification experiment examines the potential improvement in speech recognition accuracy of modelling asynchrony between the articulators by allowing asynchrony between all the formants.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:399032
Date January 2003
CreatorsWilkinson, Nicholas
PublisherUniversity of Birmingham
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0025 seconds