• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Speech and music discrimination using short-time features

Mubarak, Omer Mohsin, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2006 (has links)
This thesis addresses the problem of classifying an audio stream as either speech or music, an issue which is beginning to receive increasing attention due to its wide range of applications. Various techniques have been presented in last decade to discriminate between speech and music. However, their accuracy is still not sufficient since music can refer to a very broad class of signals due to the large number of musical instruments found in audio data. Performance can also be further compromised in noisy conditions, which are unavoidable in some practical situations. This thesis presents an analysis of feature extraction techniques and classifiers currently being used, followed by the proposal and evaluation of new features for improved classification. These include two novel cepstral features, delta cepstral energy and power spectrum deviation, along with amplitude and frequency modulation features. The modified group delay feature, initially proposed for speech recognition, is also investigated for speech and music discrimination. Experiments were performed using different sets of features, compared among themselves and with conventional MFCCs using error rate criteria and Detection Error Trade-off curves. It is shown that the proposed cepstral and modulation features result in an increase in the accuracy of the conventional MFCC based system. However, the modified group delay feature which has been shown to improve accuracy for speech classification problems, does not contribute much to the problem of speech and music discrimination. Among the ones presented here the optimum feature configuration, both modulation features with MFCC, resulted in overall error rate of 6.57% as compared to 7.43% for MFCC alone.

Page generated in 0.0432 seconds