Global ETD Search

1	Error control for scalable image and video coding Kuang, Tianbo 24 November 2003 Scalable image and video has been proposed to transmit image and video signals over lossy networks, such as the Internet and wireless networks. However, scalability alone is not a complete solution since there is a conflict between the unequal importance of the scalable bit stream and the agnostic nature of packet losses in the network. This thesis investigates three methods to combat the detrimental effects of random packet losses to scalable images and video, namely the error resilient method, the error concealment method, and the unequal error protection method within the joint source-channel coding framework. For the error resilient method, an optimal bit allocation algorithm is proposed without considering the distortion caused by packet losses. The allocation algorithm is then extended to accommodate packet losses. For the error concealment method, a simple temporal error concealment mechanism is designed to work for video signals. For the unequal error protection method, the optimal protection allocation problem is formulated and solved. These methods are tested on the wavelet-based Set Partitioning in Hierarchical Trees(SPIHT) scalable image coder. Performance gains and losses in lossy and lossless environments are studied for both the original coder and the error-controlled coders. The results show performance advantages of all three methods over the original SPIHT coder. Particularly, the unequal error protection method and error concealment method are promising for future Internet/wireless image and video transmission, because the former has very good performance even at heavy packet loss (a PSNR of 22.00 dB has been seen at nearly 60% packet loss) and the latter does not introduce any extra overhead. error control image and video coding scalable image and video
2	Error control for scalable image and video coding Kuang, Tianbo 24 November 2003 (has links) Scalable image and video has been proposed to transmit image and video signals over lossy networks, such as the Internet and wireless networks. However, scalability alone is not a complete solution since there is a conflict between the unequal importance of the scalable bit stream and the agnostic nature of packet losses in the network. This thesis investigates three methods to combat the detrimental effects of random packet losses to scalable images and video, namely the error resilient method, the error concealment method, and the unequal error protection method within the joint source-channel coding framework. For the error resilient method, an optimal bit allocation algorithm is proposed without considering the distortion caused by packet losses. The allocation algorithm is then extended to accommodate packet losses. For the error concealment method, a simple temporal error concealment mechanism is designed to work for video signals. For the unequal error protection method, the optimal protection allocation problem is formulated and solved. These methods are tested on the wavelet-based Set Partitioning in Hierarchical Trees(SPIHT) scalable image coder. Performance gains and losses in lossy and lossless environments are studied for both the original coder and the error-controlled coders. The results show performance advantages of all three methods over the original SPIHT coder. Particularly, the unequal error protection method and error concealment method are promising for future Internet/wireless image and video transmission, because the former has very good performance even at heavy packet loss (a PSNR of 22.00 dB has been seen at nearly 60% packet loss) and the latter does not introduce any extra overhead. error control image and video coding scalable image and video
3	DCT-based Image/Video Compression: New Design Perspectives Sun, Chang January 2014 (has links) To push the envelope of DCT-based lossy image/video compression, this thesis is motivated to revisit design of some fundamental blocks in image/video coding, ranging from source modelling, quantization table, quantizers, to entropy coding. Firstly, to better handle the heavy tail phenomenon commonly seen in DCT coefficients, a new model dubbed transparent composite model (TCM) is developed and justified. Given a sequence of DCT coefficients, the TCM first separates the tail from the main body of the sequence, and then uses a uniform distribution to model DCT coefficients in the heavy tail, while using a parametric distribution to model DCT coefficients in the main body. The separation boundary and other distribution parameters are estimated online via maximum likelihood (ML) estimation. Efficient online algorithms are proposed for parameter estimation and their convergence is also proved. When the parametric distribution is truncated Laplacian, the resulting TCM dubbed Laplacian TCM (LPTCM) not only achieves superior modeling accuracy with low estimation complexity, but also has a good capability of nonlinear data reduction by identifying and separating a DCT coefficient in the heavy tail (referred to as an outlier) from a DCT coefficient in the main body (referred to as an inlier). This in turn opens up opportunities for it to be used in DCT-based image compression. Secondly, quantization table design is revisited for image/video coding where soft decision quantization (SDQ) is considered. Unlike conventional approaches where quantization table design is bundled with a specific encoding method, we assume optimal SDQ encoding and design a quantization table for the purpose of reconstruction. Under this assumption, we model transform coefficients across different frequencies as independently distributed random sources and apply the Shannon lower bound to approximate the rate distortion function of each source. We then show that a quantization table can be optimized in a way that the resulting distortion complies with certain behavior, yielding the so-called optimal distortion profile scheme (OptD). Guided by this new theoretical result, we present an efficient statistical-model-based algorithm using the Laplacian model to design quantization tables for DCT-based image compression. When applied to standard JPEG encoding, it provides more than 1.5 dB performance gain (in PSNR), with almost no extra burden on complexity. Compared with the state-of-the-art JPEG quantization table optimizer, the proposed algorithm offers an average 0.5 dB gain with computational complexity reduced by a factor of more than 2000 when SDQ is off, and a 0.1 dB performance gain or more with 85% of the complexity reduced when SDQ is on. Thirdly, based on the LPTCM and OptD, we further propose an efficient non-predictive DCT-based image compression system, where the quantizers and entropy coding are completely re-designed, and the relative SDQ algorithm is also developed. The proposed system achieves overall coding results that are among the best and similar to those of H.264 or HEVC intra (predictive) coding, in terms of rate vs visual quality. On the other hand, in terms of rate vs objective quality, it significantly outperforms baseline JPEG by more than 4.3 dB on average, with a moderate increase on complexity, and ECEB, the state-of-the-art non-predictive image coding, by 0.75 dB when SDQ is off, with the same level of computational complexity, and by 1 dB when SDQ is on, at the cost of extra complexity. In comparison with H.264 intra coding, our system provides an overall 0.4 dB gain or so, with dramatically reduced computational complexity. It offers comparable or even better coding performance than HEVC intra coding in the high-rate region or for complicated images, but with only less than 5% of the encoding complexity of the latter. In addition, our proposed DCT-based image compression system also offers a multiresolution capability, which, together with its comparatively high coding efficiency and low complexity, makes it a good alternative for real-time image processing applications.
4	Advancing Learned Lossy Image Compression through Knowledge Distillation and Contextual Clustering Yichi Zhang (19960344) 29 October 2024 (has links) <p dir="ltr">In recent decades, the rapid growth of internet traffic, particularly driven by high-definition images/videos has highlighted the critical need for effective image compression to reduce bit rates and enable efficient data transmission. Learned lossy image compression (LIC), which uses end-to-end deep neural networks, has emerged as a highly promising method, even outperforming traditional methods such as the intra-coding of the versatile video coding (VVC) standard. This thesis contributes to the field of LIC in two ways. First, we present a theoretical bound-guided knowledge distillation technique, which utilizes estimated bound information rate-distortion (R-D) functions to guide the training of LIC models. Implemented with a modified hierarchical variational autoencoder (VAE), this method demonstrates superior rate-distortion performance with reduced computational complexity. Next, we introduce a token mixer neural architecture, referred to as <i>contextual clustering</i>, which serves as an alternative to conventional convolutional layers or self-attention mechanisms in transformer architectures. Contextual clustering groups pixels based on their cosine similarity and uses linear layers to aggregate features within each cluster. By integrating with current LIC methods, we not only improve coding performance but also reduce computational load. </p> Signal processing Computer vision Image and video coding Lossy Image Compression Knowledge Distillation Hierachical VAE contextual clustering
5	Multimedia Forensics Using Metadata Ziyue Xiang (17989381) 21 February 2024 (has links) <p dir="ltr">The rapid development of machine learning techniques makes it possible to manipulate or synthesize video and audio information while introducing nearly indetectable artifacts. Most media forensics methods analyze the high-level data (e.g., pixels from videos, temporal signals from audios) decoded from compressed media data. Since media manipulation or synthesis methods usually aim to improve the quality of such high-level data directly, acquiring forensic evidence from these data has become increasingly challenging. In this work, we focus on media forensics techniques using the metadata in media formats, which includes container metadata and coding parameters in the encoded bitstream. Since many media manipulation and synthesis methods do not attempt to hide metadata traces, it is possible to use them for forensics tasks. First, we present a video forensics technique using metadata embedded in MP4/MOV video containers. Our proposed method achieved high performance in video manipulation detection, source device attribution, social media attribution, and manipulation tool identification on publicly available datasets. Second, we present a transformer neural network based MP3 audio forensics technique using low-level codec information. Our proposed method can localize multiple compressed segments in MP3 files. The localization accuracy of our proposed method is higher compared to other methods. Third, we present an H.264-based video device matching method. This method can determine if the two video sequences are captured by the same device even if the method has never encountered the device. Our proposed method achieved good performance in a three-fold cross validation scheme on a publicly available video forensics dataset containing 35 devices. Fourth, we present a Graph Neural Network (GNN) based approach for the analysis of MP4/MOV metadata trees. The proposed method is trained using Self-Supervised Learning (SSL), which increased the robustness of the proposed method and makes it capable of handling missing/unseen data. Fifth, we present an efficient approach to compute the spectrogram feature with MP3 compressed audio signals. The proposed approach decreases the complexity of speech feature computation by ~77.6% and saves ~37.87% of MP3 decoding time. The resulting spectrogram features lead to higher synthetic speech detection performance.</p> Audio processing Computer vision Image and video coding Image processing Pattern recognition Video processing Digital forensics Deep learning Deepfake detection Digital forensics Video forensics Audio forensics Video metadata Audio metadata H.264 MP3 MP4 Video manipulation detection Video compression Audio compression Decision tree Deep learning Dimensionality reduction Spectrogram Graph neural networks Neural networks Transformer neural networks
6	Crime Detection From Pre-crime Video Analysis Sedat Kilic (18363729) 03 June 2024 (has links) <p dir="ltr">his research investigates the detection of pre-crime events, specifically targeting behaviors indicative of shoplifting, through the advanced analysis of CCTV video data. The study introduces an innovative approach that leverages augmented human pose and emotion information within individual frames, combined with the extraction of activity information across subsequent frames, to enhance the identification of potential shoplifting actions before they occur. Utilizing a diverse set of models including 3D Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and a specially developed transformer architecture, the research systematically explores the impact of integrating additional contextual information into video analysis.</p><p dir="ltr">By augmenting frame-level video data with detailed pose and emotion insights, and focusing on the temporal dynamics between frames, our methodology aims to capture the nuanced behavioral patterns that precede shoplifting events. The comprehensive experimental evaluation of our models across different configurations reveals a significant improvement in the accuracy of pre-crime detection. The findings underscore the crucial role of combining visual features with augmented data and the importance of analyzing activity patterns over time for a deeper understanding of pre-shoplifting behaviors.</p><p dir="ltr">The study’s contributions are multifaceted, including a detailed examination of pre-crime frames, strategic augmentation of video data with added contextual information, the creation of a novel transformer architecture customized for pre-crime analysis, and an extensive evaluation of various computational models to improve predictive accuracy.</p> Computer vision Image and video coding Image processing Pattern recognition Video processing crime detection video analysis augmented information pose estimation emotion estimation optical flow deep learning pre-crime video analysis video understanding anomaly detection contextual information shoplifting prevention crime prevention vision transformer transformer generative AI

1

Page generated in 0.1153 seconds