This thesis presents a stereo coding architecture for the ITU-T G.719 fullband mono codec. G.719 is suitable for teleconferencing applications with a competitive audio quality for speech and audio signals that are encoded at 32, 48 and 64 kbps. The proposed stereo architecture comprises parametric stereo coding where the spatial properties of the stereo channels are modeled with the use of parameters, which are encoded and transmitted to the decoder together with an encoded downmix of the stereo channels. The stereo architecture has been implemented in MATLAB with an external mono coding using a floating point ANSI-C implementation of the ITU-T G.719 codec. Two parametric stereo models have been implemented in a framework operating in the complex-valued Modified Discrete Fourier Transform (MDFT) domain. The first model is based on the inter-channel cues that represent level differences, time differences and coherences between the stereo channels. The cues approximate the corresponding interaural cues that characterize our localization of sound in space. The second model is based on the Karhunen-Loève Transform (KLT) with the associated rotation angles, the inter-channel time differences and the residual scaling parameters. An improved MDFT domain extraction of the inter-channel time difference between the stereo channels has been used for both stereo models. The extracted stereo parameters have been non-uniformly quantized based on the spatial accuracy and the frequency dependency of the human auditory system. The data rate of the stereo parameters has been estimated for each model to around 4 kbps. As a result G.719 has been used as a core codec at 44 and 60 kbps in order to subjectively evaluate the performance of the fullband stereo codec at 48 and 64 kbps. In the comparison with G.719 dual mono coding, i.e. independent mono coding of the stereo channels, the evaluation showed a higher performance of the proposed stereo models for complex clean and reverberant speech signals. However, no consistent gain of the parametric stereo coding was revealed for noisy speech, mixed content and music signals. In addition, the first stereo model showed consistently a slightly higher performance than the second model in the subjective evaluation but with no significant difference. The results revealed a high potential for parametric stereo coding using the ITU-T G.719 codec. In comparison to the existing stereo codecs 3GPP AMR-WB+ and 3GPP eAAC+ the average performance was better at the equal bitrate of 48 kbps.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-153636 |
Date | January 2011 |
Creators | Jansson, Tomas |
Publisher | Uppsala universitet, Signaler och System |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC F, 1401-5757 ; 11 034 |
Page generated in 0.0017 seconds