Global ETD Search

Return to search

Continuous, speaker-independent, speech recognition for a speech to viseme translator

The work presented in this thesis forms part of a research project which attempts to generate a visualisation of a speaker's mouth from purely acoustic speech signals. The aim is to provide an aid for partially hearing impaired people in which visual information is presented alongside limited acoustic signals, facilitating easier use of the telephone. The system is essentially a low-level speech recogniser in which phonemic information is extracted from the speech waveform and mapped onto visemes generated on a synthetic facial image. This thesis presents a description of a major part of this project, that is, the development of an accurate phoneme discriminator which is capable of speaker independent operation, on continuous speech. The recognition process is realised in three stages: a pre-processor to convert the speech into a suitable parametric form; a pattern recogniser to identify the possible phoneme classes and a post-processor to produce the viseme information. The pattern recognition stage uses a self-organising Kohonen network, followed by a Learning Vector Quantiser (LVQ) to further improve the recognition accuracy. The performance of this stage is highly dependent on the choice of pre-processor used at the input to the network and it is the design of the pre-processor stage that forms a significant part of this work. A novel technique known as the pseudo-cepstrum forms the basis of this pre-processor. Extensive investigations have been conducted into the dependence of performance on a range of parameters, both at the pre-processor stage and within the Kohonen classifier. In particular, a performance comparison of several preprocessor techniques, including the pseudo-cepstrum, has been carried out. Factors affecting both the training and operation of the classifier are also described here, with the sensitivity of recognition performance to the input data, being a major issue. Overall recognition accuracies of 80% have been achieved.

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.298086

621.3994

Acoustic

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:298086
Date	January 1999
Creators	Kelleher, Holly
Publisher	University of Surrey
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://epubs.surrey.ac.uk/844558/

Page generated in 0.002 seconds

Continuous, speaker-independent, speech recognition for a speech to viseme translator

Description

Links & Downloads

Tags

Additional Fields