Return to search

Machine learning for neural coding of sound envelopes : slithering from sinusoids to speech

Specific locations within the brain contain neurons which respond, by firing action potentials (spikes), when a sound is played in the ear of a person or animal. The number and timing of these spikes encodes information about the sound; this code is the basis for us perceiving and understanding the acoustic world around us. To understand how the brain processes sound, we must understand this code. The difficulty then lies in evaluating the unknown neural code. This thesis applies Machine Learning to evaluate auditory coding of dynamic sounds by spike trains, with datasets of varying complexity. In the first part, a battery of Machine Learning (ML) algorithms are used to evaluate modulation frequency coding from the neural response to amplitude-modulated sinusoids in cat Cochlear Nucleus spike train data. It is found on this recognition task that, whilst absolute performance levels depend on the types of algorithms, their performance relative to each other is the same on different types of neurons. Thus a single powerful classification algorithm is sufficient for evaluating neural codes. Similarly, different performance measures are useful in understanding differences between ML algorithms, but they shed little light on different neural coding strategies. In contrast, the features used for classification are crucial; e.g. Vector Strength does not provide an accurate measure of the information contained in spike timing. Overall, different types of neurons do not encode the same amount of amplitude-modulation information. This emphasises the value of using powerful Machine Learning methods applied to raw spike timing information. In the second part, a more ecological and heterogeneous set of sounds — speech — is used. The application of Hidden Markov Model based Automatic Speech Recognition (ASR) is tested within the constraints of an electrophysiological experiment. The findings suggest that a continuous digit recognition task is amenable to a physiology experiment: using only 10 minutes of simulated recording to train statistical models of phonemes, an accuracy of 70% could be achieved. This result jumps to about 85% when using 200 minutes worth of simulated data. Using a digit recognition framework is sufficient to examine the influence on the performance of different aspects of the size and nature of a neural population and the role of spike timing. Previous results suggest, however, that this accuracy would be reduced if experimental Inferior Colliculus data were used instead of a guinea-pig cochlear model. On the other hand, a fully-fledged continuous ASR task on a large vocabulary with many speakers may result in insufficient phoneme accuracy (∼40%) to base an auditory coding-related investigation on. Overall this suggests that complex ML algorithms such as ASR can nevertheless be practically used to assess neural coding of speech, with careful selection of features.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:748514
Date January 2018
CreatorsLevy, Alban Hugo
PublisherUniversity of Nottingham
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://eprints.nottingham.ac.uk/52224/

Page generated in 0.0019 seconds