The past decade has seen extensive research on audio classification algorithms which playa key role in multimedia applications, such as the retrieval of audio information from an audio or audiovisual database. However, the effect of background noise on the performance of classification has not been widely investigated. Motivated by the noise-suppression property of the early auditory (EA) model presented by Wang and Shamma, we seek in this thesis to further investigate this property and to develop improved algorithms for audio classification in the presence of background noise. / With respect to the limitation of the original analysis, a better yet mathematically tractable approximation approach is first proposed wherein the Gaussian cumulative distribution function is used to derive a new closed-form expression of the auditory spectrum at the output of the EA model, and to conduct relevant analysis. Considering the computational complexity of the original EA model, a simplified auditory spectrum is proposed, wherein the underlying analysis naturally leads to frequency-domain approximation for further reduction in the computational complexity. Based on this time-domain approximation, a simplified FFT-based spectrum is proposed wherein a local spectral self-normalization is implemented. An improved implementation of this spectrum is further proposed to calculate a so-called FFT-based auditory spectrum, which allows more flexibility in the extraction of noise-robust audio features. / To evaluate the performance of the above FFT-based spectra, speech/music/noise and noise/non-noise classification experiments are conducted wherein a support vector machine algorithm (SVMstruct) and a decision tree learning algorithm (C4.5) are used as the classifiers. Several features are used for the classification, including the conventional mel-frequency cepstral coefficient (MFCC) features as well as DCT-based and spectral features derived from the proposed FFT-based spectra. Compared to the conventional features, the auditory-related features show more robust performance in mismatched test cases. Test results also indicate that the performance of the proposed FFT-based auditory spectrum is slightly better than that of the original auditory spectrum, while its computational complexity is reduced by an order of magnitude. / Finally, to further explore the proposed FFT-based auditory spectrum from a practical audio classification perspective, a floating-point DSP implementation is developed and optimized on the TMS320C6713 DSP Starter Kit (DSK) from Texas Instruments.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.115863 |
Date | January 2008 |
Creators | Chu, Wei, 1966- |
Publisher | McGill University |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | application/pdf |
Coverage | Doctor of Philosophy (Department of Electrical and Computer Engineering.) |
Rights | All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated. |
Relation | alephsysno: 002840627, proquestno: AAINR66629, Theses scanned by UMI/ProQuest. |
Page generated in 0.0016 seconds