Return to search

Extracting room acoustic parameters from received speech signals using artificial neural networks

Quantitative room acoustics over a century has accumulated a knowledge base centred around objective acoustic parameters. Realistic and accurate measurements are essential in room acoustics. Occupied measurements are difficult to undertake with current technology, yet it is well established that occupancy changes acoustics. For this reason, new measurement techniques are sought. This thesis concerns anew, machine learning based approach for measuring room acoustic parameters, which is particularly useful for occupied in-situ measurements. A set of artificial neural networks, associated pre-processors and machine learning regimes are developed to extract Reverberation Time (RT), Early Decay Time (EDT) and Speech Transmission Index (STI) from received speech signals. Utilising naturalistic sounds - speech - as excitations, the developed methods circumvent the use of unpleasant noisy test signals and therefore measurements can be made in occupied spaces in a non-invasive fashion. Given the non-invasive nature and achievable accuracy, the new methods can facilitate occupied measurements, providing an alternative to traditional methods to better quantify acoustics of spaces where speech communication is important. Much of the development work of the neural network methods focuses on the preprocessors which produce data reduced and pre-conditioned signals for the neural networks. Two different speech scenarios, separate utterances and continuous running speech are considered, leading to the development of four major neural network methods: 1. Time domain method to extract RTIEDT from separate utterances. 2. Straightforward FFT method to extract STI from short-time speech. 3. Frequency domain method to extract STI from long-time running speech. 4. Frequency domain method to extract RTIEDT from long-time running speech. These methods are all based on supervised learning. Unsupervised models, representing another important class of neural networks, are also investigated in the context of this study and are found useful as pre-processors. The model development and validations are carried out through computer simulations. Results show that better than O.ls and 0.02 resolutions in reverberation time and STI extractions are achievable based on a "one-net-one-speech" machine learning regime: a neural network trains on a particular anechoic speech to extract a designated objective parameter under the excitation of that speech. Neural network systems extracting acoustic parameters from received arbitrary speech signals without using prior knowledge of the speech stimuli, termed source independent measurements, are explored. Although the achieved accuracy is not as good as that of the standard methods and the developed neural network methods on the one-net-one-speech basis, the source independent extraction is potentially more useful in practical systems. Improving the accuracy of the source independent measurements and extending the developed methods to music signals are seemingly the most significant further work of this study.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:272617
Date January 2002
CreatorsLi, Francis Feng
PublisherUniversity of Salford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://usir.salford.ac.uk/42990/

Page generated in 0.0032 seconds