Digital audio is increasingly becoming more and more a part of our daily lives. Unfortunately, the excessive bitrate associated with the raw digital signal makes it an extremely expensive representation. Applications such as digital audio broadcasting, high definition television, and internet audio, require high quality audio at low bitrates. The field of audio coding addresses this important issue of reducing the bitrate of digital audio, while maintaining a high perceptual quality. Developing an efficient audio coder requires a detailed analysis of the audio signals themselves. It is important to find a representation that can concisely model any general audio signal. In this thesis, we propose two new high quality audio coders based on two different audio representations - the sinusoidal-wavelet representation, and the warped linear predictive coding (WLPC)-wavelet representation. In addition to high quality coding, it is also important for audio coders to be flexible in their application. With the increasing popularity of internet audio, it is advantageous for audio coders to address issues related to real-time audio delivery. The issue of bitstream scalability has been targeted in this thesis, and therefore, a third audio coder capable of bitstream scalability is also proposed. The performance of each of the proposed coders was evaluated by comparisons with the MPEG layer III coder. The first coder proposed is based on a hybrid sinusoidal-wavelet representation. This assumes that each frame of audio can be modelled as a sum of sinusoids plus a noisy residual. The discrete wavelet transform (DWT) is used to decompose the residual into subbands that approximate the critical bands of human hearing. A perceptually derived bit allocation algorithm is then used to minimise the audible distortions introduced from quantising the DWT coefficients. Listening tests showed that the coder delivers near-transparent quality for a range of critical audio signals at G4 kbps. It also outperforms the MPEG layer III coder operating at this same bitrate. This coder, however, is only useful for high quality coding, and is difficult to scale to operate at lower rates. The second coder proposed is based on a hybrid WLPC-wavelet representation. In this approach, the spectrum of the audio signal is estimated by an all pole filter using warped linear prediction (WLP). WLP operates on a warped frequency domain, where the resolution can be adjusted to approximate that of the human auditory system. This makes the inherent noise shaping of the synthesis filter even more suited to audio coding. The excitation to this filter is transformed using the DWT and perceptually encoded. Listening tests showed that near-transparent coding is achieved at G4 kbps. The coder was also found to be slightly superior to the MPEG layer III coder operating at this same bitrate. The third proposed coder is similar to the previous WLPC-wavelet coder, but modified to achieve bitstream scalability. A noise model for high frequency components is included to keep the overall bitrate low, and a two stage quantisation scheme for the DWT coefficients is implemented. The first stage uses fixed rate scalar and vector quantisation to provide a coarse approximation of the coefficients. This allows for low bitrate, low quality versions of the input signal to be embedded in the overall bitstream. The second stage of quantisation adds detail to the coefficients, and hence, enhances the quality of the output signal. Listening tests showed that signal quality gracefully improves as the bitrate increases from 16 kbps to SO kbps. This coder has a performance that is comparable to the MPEG layer III coder operating at a similar (but fixed) bitrate.
Identifer | oai:union.ndltd.org:ADTP/264807 |
Date | January 2003 |
Creators | Ning, Daryl |
Publisher | Queensland University of Technology |
Source Sets | Australiasian Digital Theses Program |
Detected Language | English |
Rights | Copyright Daryl Ning |
Page generated in 0.0017 seconds