The use of audio and video communication has increased exponentially over the last decade and has gone from speech over GSM to HD resolution video conference between continents on mobile devices. As the use becomes more widespread the interest in delivering high quality media increases even on devices with limited resources. This includes both development and enhancement of the communication chain but also the topic of objective measurements of the perceived quality. The focus of this thesis work has been to perform enhancement within speech encoding and video decoding, to measure influence factors of audio and video performance, and to build methods to predict the perceived video quality. The audio enhancement part of this thesis addresses the well known problem in the GSM system with an interfering signal generated by the switching nature of TDMA cellular telephony. Two different solutions are given to suppress such interference internally in the mobile handset. The first method involves the use of subtractive noise cancellation employing correlators, the second uses a structure of IIR notch filters. Both solutions use control algorithms based on the state of the communication between the mobile handset and the base station. The video enhancement part presents two post-filters. These two filters are designed to improve visual quality of highly compressed video streams from standard, block-based video codecs by combating both blocking and ringing artifacts. The second post-filter also performs sharpening. The third part addresses the problem of measuring audio and video delay as well as skewness between these, also known as synchronization. This method is a black box technique which enables it to be applied on any audiovisual application, proprietary as well as open standards, and can be run on any platform and over any network connectivity. The last part addresses no-reference (NR) bitstream video quality prediction using features extracted from the coded video stream. Several methods have been used and evaluated: Multiple Linear Regression (MLR), Artificial Neural Network (ANN), and Least Square Support Vector Machines (LS-SVM), showing high correlation with both MOS and objective video assessment methods as PSNR and PEVQ. The impact from temporal, spatial and quantization variations on perceptual video quality has also been addressed, together with the trade off between these, and for this purpose a set of locally conducted subjective experiments were performed.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-00604 |
Date | January 2014 |
Creators | Rossholm, Andreas |
Publisher | Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, Karlskrona : Blekinge Institute of Technology |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Doctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | Blekinge Institute of Technology Doctoral Dissertation Series, 1653-2090 ; 16 |
Page generated in 0.0025 seconds