Return to search

Audio compression and speech enhancement using temporal masking models

Of the few existing models of temporal masking applicable to problems such as compression and enhancement, none are based on empirical data from the psychoacoustic literature, presumably because the multidimensional nature of the data makes the derivation of tractable functional models difficult. This thesis presents two new functional models of the temporal masking effect of the human auditory system, and their exploitation in audio compression and speech enhancement applications. Traditional audio compression algorithms do not completely utilise the temporal masking properties of the human auditory system, relying solely on simultaneous masking models. A perceptual wavelet packet-based audio coder has been devised that incorporates the first developed temporal masking model and combined with simultaneous masking models in a novel manner. An evaluation of the coder using both objective (PEAQ, ITU-R BS.1387) and extensive subjective tests (ITU-R BS.1116) revealed a bitrate reduction of more than 17% compared with existing simultaneous masking-based audio coders, while preserving transparent quality. In addition, the oversampled wavelet packet transform (ODWT) has been newly applied to obtain alias-free coefficients for more accurate masking threshold calculation. Finally, a low-complexity scalable audio coding algorithm using the ODWT-based thresholds and temporal masking has been investigated. Currently, there is a strong need for innovative speech enhancement algorithms exploiting the auditory masking effects of human auditory system that perform well at very low signal-to-noise ratio. Existing competitive noise suppression algorithms and those that incorporate simultaneous masking were examined and evaluated for their suitability as baseline algorithms. Objective measures using PESQ (ITU-T P.862) and subjective measures (ITU-T P.835) demonstrate that the proposed enhancement scheme, based on a second new masking model, outperformed the seven baseline speech enhancement methods by at least 6- 20% depending on the SNR. Hence, the proposed speech enhancement scheme exploiting temporal masking effects has good potential across many types and intensities of environmental noise. Keywords: human auditory system; temporal masking; simultaneous masking; audio compression; speech enhancement; subjective test; objective test.

Identiferoai:union.ndltd.org:ADTP/187605
Date January 2007
CreatorsGunawan, Teddy Surya, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright

Page generated in 0.0018 seconds