This thesis addresses the problem of classifying an audio stream as either speech or music, an issue which is beginning to receive increasing attention due to its wide range of applications. Various techniques have been presented in last decade to discriminate between speech and music. However, their accuracy is still not sufficient since music can refer to a very broad class of signals due to the large number of musical instruments found in audio data. Performance can also be further compromised in noisy conditions, which are unavoidable in some practical situations. This thesis presents an analysis of feature extraction techniques and classifiers currently being used, followed by the proposal and evaluation of new features for improved classification. These include two novel cepstral features, delta cepstral energy and power spectrum deviation, along with amplitude and frequency modulation features. The modified group delay feature, initially proposed for speech recognition, is also investigated for speech and music discrimination. Experiments were performed using different sets of features, compared among themselves and with conventional MFCCs using error rate criteria and Detection Error Trade-off curves. It is shown that the proposed cepstral and modulation features result in an increase in the accuracy of the conventional MFCC based system. However, the modified group delay feature which has been shown to improve accuracy for speech classification problems, does not contribute much to the problem of speech and music discrimination. Among the ones presented here the optimum feature configuration, both modulation features with MFCC, resulted in overall error rate of 6.57% as compared to 7.43% for MFCC alone.
Identifer | oai:union.ndltd.org:ADTP/186977 |
Date | January 2006 |
Creators | Mubarak, Omer Mohsin, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW |
Publisher | Awarded by:University of New South Wales. Electrical Engineering & Telecommunications |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright Omer Mohsin Mubarak, http://unsworks.unsw.edu.au/copyright |
Page generated in 0.0017 seconds