Return to search

Spoken letter recognition with neural networks

Neural networks have recently been applied to real-world speech recognition problems with a great deal of success. This thesis developes a strategy for optimising a neural network known as the Radial Basis Function classifier (RBF), on a large spoken letter recognition problem designed by British Telecom Research Laboratories. The strategy developed can be viewed as a compromise between a fully adaptive approach involving prohibitively large amounts of computation, and a heuristic approach resulting in poor generalisation. A value for the optimal number of kernel functions is suggested, and methods for determining the positions of the centres and the values of the width parameters are provided. During the evolution of the optimisation strategy it was demonstrated that spatial organisation of the centres does not adversely affect the ability of the classifier to generalise. An RBF employing the optimisation strategy achieved a lower error rate than a multilayer perceptron and two traditional static pattern classifiers on the same problem. The error rate of the RBF was very close to the theoretical minimum error rate obtainable with an optimal Bayes classifier. In addition to error rate, the performance of the classifiers was assessed in terms of the computational requirements of training and classification, illustrating the significant trade-off between computational investment in training and level of generalisation achieved. The error rate of the RBF was compared with that of a well established method of dynamic classification to examine whether non-linear time normalisation of word patterns was advantageous to generalisation. It was demonstrated that the dynamic classifier was better suited to small-scale speech recognition problems, and the RBF to speaker-independent speech recognition problems. The dynamic classifier was then combined with a neural network algorithm, greatly reducing its computational requirement without significantly increasing its error rate. This system was then extended into a novel system for visual feedback therapy in which speech is visualised as a moving trajectory on a computer screen.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:302812
Date January 1991
CreatorsReynolds, James H.
ContributorsTarassenko, Lionel
PublisherUniversity of Oxford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://ora.ox.ac.uk/objects/uuid:b30872a7-7bd8-437f-bd3a-649de981d352

Page generated in 0.0018 seconds