Spelling suggestions: "subject:"video compression."" "subject:"ideo compression.""
221 |
TOUCH EVENT DETECTION AND TEXTURE ANALYSIS FOR VIDEO COMPRESSIONQingshuang Chen (11198871) 29 July 2021 (has links)
<div>Touch event detection investigates the interaction between two people from video recordings. We are interested in a particular type of interaction which occurs between a caregiver and an infant, as touch is a key social and emotional signal used by caregivers when interacting with their children. We propose an automatic touch event detection and recognition method to determine the potential timing when the caregiver touches the infant, and classify the event into six touch types based on which body part of the infant has been touched. We leverage deep learning based human pose estimation and person segmentation to analyze the spatial relationship between the caregivers’ hands and the infant. We demonstrate promising performance on touch event detection and classification, showing great potential for reducing human effort when generating groundtruth annotation.</div><div><br></div><div>Recently, artificial intelligence powered techniques have shown great potential to increase the efficiency of video compression. In this thesis, we describe a texture analysis pre-processing method that leverages deep learning based scene understanding to extract semantic areas for the improvement of subsequent video coder. Our proposed method generates a pixel-level texture mask by combining the semantic segmentation with simple post-processing strategy. Our approach is integrated into a switchable texture-based video coding method. We demonstrate that for many standard and user generated test sequences, the proposed method achieves significant data rate reduction without noticeable visual artifacts.</div>
|
222 |
Real-Time Wind Estimation and Video Compression Onboard Miniature Aerial VehiclesRodriguez Perez, Andres Felipe 02 March 2009 (has links) (PDF)
Autonomous miniature air vehicles (MAVs) are becoming increasingly popular platforms for the collection of data about an area of interest for military and commercial applications. Two challenges that often present themselves in the process of collecting this data. First, winds can be a significant percentage of the MAV's airspeed and can affect the analysis of collected data if ignored. Second, the majority of MAV's video is transmitted using RF analog transmitters instead of the more desirable digital video due to the computational intensive compression requirements of digital video. This two-part thesis addresses these two challenges. First, this thesis presents an innovative method for estimating the wind velocity using an optical flow sensor mounted on a MAV. Using the flow of features measured by the optical flow sensor in the longitudinal and lateral directions, the MAV's crab-angle is estimated. By combining the crab-angle with measurements of ground track from GPS and the MAV's airspeed, the wind velocity is computed. Unlike other methods, this approach does not require the use of a “varying” path (flying at multiple headings) or the use of magnetometers. Second, this thesis presents an efficient and effective method for video compression by drastically reducing the computational cost of motion estimation. When attempting to compress video, motion estimation is usually more than 80% of the computation required to compress the video. Therefore, we propose to estimate the motion and reduce computation by using (1) knowledge of camera locations (from available MAV IMU sensor data) and (2) the projective geometry of the camera. Both of these methods are run onboard a MAV in real time and their effectiveness is demonstrated through simulated and experimental results.
|
223 |
A General Framework for Model Adaptation to Meet Practical Constraints in Computer VisionHuang, Shiyuan January 2024 (has links)
Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challenges as models need to meet complex real-world requirements. This thesis is dedicated to building practical deep learning models, where we focus on two main challenges in vision systems: data efficiency and variability. We address these issues by providing a general model adaptation framework that extends models with practical capabilities.
In the first part of the thesis, we explore model adaptation approaches for efficient representation. We illustrate the benefits of different types of efficient data representations, including compressed video modalities from video codecs, low-bit features and sparsified frames and texts. By using such efficient representation, the system complexity such as data storage, processing and computation can be greatly reduced. We systematically study various methods to extract, learn and utilize these representations, presenting new methods to adapt machine learning models for them. The proposed methods include a compressed-domain video recognition model with coarse-to-fine distillation training strategy, a task-specific feature compression framework for low-bit video-and-language understanding, and a learnable token sparsification approach for sparsifying human-interpretable video inputs. We demonstrate new perspectives of representing vision data in a more practical and efficient way in various applications.
The second part of the thesis focuses on open environment challenges, where we explore model adaptation for new, unseen classes and domains. We examine the practical limitations in current recognition models, and introduce various methods to empower models in addressing open recognition scenarios. This includes a negative envisioning framework for managing new classes and outliers, and a multi-domain translation approach for dealing with unseen domain data. Our study shows a promising trajectory towards models exhibiting the capability to navigate through diverse data environments in real-world applications.
|
224 |
Efficient Software and Hardware Implementations of the H.264 Entropy EncodersHoffman, Marc January 2011 (has links)
No description available.
|
225 |
Evaluation and implementation of a networked video streaming solution for academic useMolin, Per R. 01 April 2000 (has links)
No description available.
|
226 |
Selecting stimuli parameters for video quality studies based on perceptual similarity distancesKumcu, A., Platisa, L., Chen, H., Gislason-Lee, Amber J., Davies, A.G., Schelkens, P., Taeymans, Y., Philips, W. 16 March 2015 (has links)
Yes / This work presents a methodology to optimize the selection of multiple parameter levels of an image acquisition,
degradation, or post-processing process applied to stimuli intended to be used in a subjective image or video
quality assessment (QA) study. It is known that processing parameters (e.g. compression bit-rate) or techni-
cal quality measures (e.g. peak signal-to-noise ratio, PSNR) are often non-linearly related to human quality
judgment, and the model of either relationship may not be known in advance. Using these approaches to select
parameter levels may lead to an inaccurate estimate of the relationship between the parameter and subjective
quality judgments – the system’s quality model. To overcome this, we propose a method for modeling the rela-
tionship between parameter levels and perceived quality distances using a paired comparison parameter selection
procedure in which subjects judge the perceived similarity in quality. Our goal is to enable the selection of evenly
sampled parameter levels within the considered quality range for use in a subjective QA study. This approach
is tested on two applications: (1) selection of compression levels for laparoscopic surgery video QA study, and
(2) selection of dose levels for an interventional X-ray QA study. Subjective scores, obtained from the follow-up
single stimulus QA experiments conducted with expert subjects who evaluated the selected bit-rates and dose
levels, were roughly equidistant in the perceptual quality space - as intended. These results suggest that a
similarity judgment task can help select parameter values corresponding to desired subjective quality levels. / Parts of this work were performed within the Telesurgery project (co-funded by iMinds, a digital research institute founded by the Flemish Government; project partners are Unilabs Teleradiology, SDNsquare and Barco, with project support from IWT) and the PANORAMA project (co-funded by grants from Belgium, Italy, France, the Netherlands, the United Kingdom, and the ENIAC Joint Undertaking).
|
227 |
Perceptual Criterion Based Rate Control And Fast Mode Search For Spatial Intra Prediction In Video CodingNagori, Soyeb 05 1900 (has links)
This thesis dwells on two important problems in the field of video coding; namely rate control and spatial domain intra prediction. While the former is applicable generally to most video compression standards, the latter applies to recent advanced video compression standards such as H.264, VC1 and AVS.
Rate control regulates the instantaneous video bit-rate to maximize a picture quality metric while satisfying channel rate and buffer size constraints. Rate control has an important bearing on the picture quality of encoded video. Typically, a quality metric such as Peak Signal-to-Noise ratio (PSNR) or weighted signal-to-noise ratio (WSNR) is chosen out of convenience. However neither metric is a true measure of perceived video quality.
A few researchers have attempted to derive rate control algorithms with the combination of standard PSNR and ad-hoc perceptual metrics of video quality. The concept of using perceptual criterion for video coding was introduced in [7] within the context of perceptual adaptive quantization. In this work, quantization noise levels were adjusted such that more noise was allowed where it was less visible (busy and textured areas) while sensitive areas (typically flat and low detail regions) were finely quantized. Macro–blocks were classified into low detail, texture and edge areas depending on a classifier that studied the variance of sub-blocks within a macro-block (MB). The Rate models were trained from training sets of pre -classified video. One drawback of the above scheme as with standard PSNR was that neither accounts for the perceptual effect of motion. The work in [8] achieved this by assigning higher weights to the regions of the image that were experiencing the highest motion. Also, the center of the image and objects in the foreground are perceived as more important than the sides.
However, attempts to use perceptual metrics for video quality have been limited by the accuracy of the video quality metrics chosen. In the recent years, new and improved metrics of subjective quality have been invented and their statistical accuracy has been studied in a formal manner. Particularly interesting is the work undertaken by ITU and the Video quality experts group (VQEG). VQEG conducted two phases of testing; in the first pha se, several algorithms were tested but they were not found to be very accurate, in fact none were found to be any more accurate than PSNR based metric. In the second phase of testing a few years later, a few new algorithms were experimented with, and it wa s concluded that four of these did achieve results good enough to warrant their standardization as a part of ITU –T Recommendation J.144. These experiments are referred to as the FR-TV (Full Reference Television) phase-II evaluations. ITU-T J.144 does not explicitly identify a single algorithm but provides guidelines on the selection of appropriate techniques to objectively measure subjective video quality. It describes four reference algorithms as well as PSNR. Amongst the four, the NTIA General Video Quality Model (VQM), [11] is the best performing and has been adopted by American National Standards Institute (ANSI) as a North American standard T1.801.03. NTIA’s approach has been to focus on defining parameters that model how humans perceive video quality. These parameters have been combined using linear models to produce estimates of video quality that closely approximate subjective test results. NTIA General Video Quality Model (VQM) has been proven to have strong correlation with subjective quality.
In the first part of the thesis, we apply metrics motivated by NTIA-VQM model within a rate control algorithm to maximize perceptual video quality. We derive perceptual weights using key NTIA parameters to influence QP value used to decide degree of quantization. Our experiments demonstrate that a perceptual quality motivated standard TMN8 rate control in an H.263 encoder results in perceivable quality improvements over a baseline TMN8 rate control algorithm that uses a PSNR metric. Our experimental results on a set of 11 sequences show on an average reduction of 6% in bitrate using the proposed algorithm for the same perceptual quality as standard TMN-8.
The second part of our thesis work deals with spatial domain intra prediction used in advance video coding standard such as H.264. The H.264 Advanced Video coding standard [36] has been shown to achieve video quality similar to older standards such as MPEG2 and H.263 at nearly half the bit-rate. Generally, this compression improvement is attributed to several new tools that were introduced in H.264 – including spatial intra prediction, adaptive block size for motion compensation, in-loop de-blocking filter, context adaptive binary arithmetic coding (CABAC), and multiple reference frames.
While the new tools allow better coding efficiency, they also introduce additi onal computational complexity at both encoder and decoder ends. We are especially concerned here on the impact of Intra prediction on the computational complexity of the encoder. H.264 reference implementations such as JM [29] search through all allowed intra-rediction “modes” in order to find the optimal mode. While this approach yields the optimal prediction mode, it comes at an extremely heavy computational cost. Hence there is a lot of interest into well -motivated algorithms that reduce the computational complexity of the search for the best prediction mode, while retaining the quality advantages of full-search Intra4x4.
We propose a novel algorithm to reduce the complexity of full search by exploiting our knowledge of the source statistics. Specifically, we analyze the transform domain energy distribution of the original 4x4 block in different directions and use the results of our analysis to eliminate unlikely modes and reduce the search space for the optimal I ntra mode. Experimental results show that the proposed algorithm achieves quality metrics (PSNR) similar to full search at nearly a third of the complexity.
This thesis has four chapters and is organized as follows, in the first chapter we introduce basics of video encoding and subsequently present exiting work in the area of perceptual rate control and introduce TMN-8 rate control algorithm in brief. At the end we introduce spatial domain intra prediction. In the second chapter we explain the challenges present in combining NTIA perceptual parameters with TMN8 rate control algorithm. We examine perceptual features used by NTIA from a video compression perspective and explain how the perceptual metrics capture typical compression artifacts. We next present a two pass perceptual rate control (PRCII) algorithm. Finally, we list experimental results on set of video sequences showing on an average of 6% bit-rate reduction by using PRC-II rate control over standard TMN-8 rate control. Chapter 3 contains part-II of our thesis work on, spatial domain intra prediction . We start by reviewing existing work in intra prediction and then present the details of our proposed intra prediction algorithm and experimental results. We finally conclude this thesis in chapter 4 and discuss direction for the future work on both our proposed algorithms.
|
228 |
Time Stamp Synchronization in Video SystemsYang, Hsueh-szu, Kupferschmidt, Benjamin 10 1900 (has links)
ITC/USA 2010 Conference Proceedings / The Forty-Sixth Annual International Telemetering Conference and Technical Exhibition / October 25-28, 2010 / Town and Country Resort & Convention Center, San Diego, California / Synchronized video is crucial for data acquisition and telecommunication applications. For real-time applications, out-of-sync video may cause jitter, choppiness and latency. For data analysis, it is important to synchronize multiple video channels and data that are acquired from PCM, MIL-STD-1553 and other sources. Nowadays, video codecs can be easily obtained to play most types of video. However, a great deal of effort is still required to develop the synchronization methods that are used in a data acquisition system. This paper will describe several methods that TTC has adopted in our system to improve the synchronization of multiple data sources.
|
229 |
FPGA Prototyping of a Watermarking Algorithm for MPEG-4Cai, Wei 05 1900 (has links)
In the immediate future, multimedia product distribution through the Internet will become main stream. However, it can also have the side effect of unauthorized duplication and distribution of multimedia products. That effect could be a critical challenge to the legal ownership of copyright and intellectual property. Many schemes have been proposed to address these issues; one is digital watermarking which is appropriate for image and video copyright protection. Videos distributed via the Internet must be processed by compression for low bit rate, due to bandwidth limitations. The most widely adapted video compression standard is MPEG-4. Discrete cosine transform (DCT) domain watermarking is a secure algorithm which could survive video compression procedures and, most importantly, attacks attempting to remove the watermark, with a visibly degraded video quality result after the watermark attacks. For a commercial broadcasting video system, real-time response is always required. For this reason, an FPGA hardware implementation is studied in this work. This thesis deals with video compression, watermarking algorithms and their hardware implementation with FPGAs. A prototyping VLSI architecture will implement video compression and watermarking algorithms with the FPGA. The prototype is evaluated with video and watermarking quality metrics. Finally, it is seen that the video qualities of the watermarking at the uncompressed vs. the compressed domain are only 1dB of PSNR lower. However, the cost of compressed domain watermarking is the complexity of drift compensation for canceling the drifting effect.
|
230 |
General Purpose Computing in Gpu - a Watermarking Case StudyHanson, Anthony 08 1900 (has links)
The purpose of this project is to explore the GPU for general purpose computing. The GPU is a massively parallel computing device that has a high-throughput, exhibits high arithmetic intensity, has a large market presence, and with the increasing computation power being added to it each year through innovations, the GPU is a perfect candidate to complement the CPU in performing computations. The GPU follows the single instruction multiple data (SIMD) model for applying operations on its data. This model allows the GPU to be very useful for assisting the CPU in performing computations on data that is highly parallel in nature. The compute unified device architecture (CUDA) is a parallel computing and programming platform for NVIDIA GPUs. The main focus of this project is to show the power, speed, and performance of a CUDA-enabled GPU for digital video watermark insertion in the H.264 video compression domain. Digital video watermarking in general is a highly computationally intensive process that is strongly dependent on the video compression format in place. The H.264/MPEG-4 AVC video compression format has high compression efficiency at the expense of having high computational complexity and leaving little room for an imperceptible watermark to be inserted. Employing a human visual model to limit distortion and degradation of visual quality introduced by the watermark is a good choice for designing a video watermarking algorithm though this does introduce more computational complexity to the algorithm. Research is being conducted into how the CPU-GPU execution of the digital watermark application can boost the speed of the applications several times compared to running the application on a standalone CPU using NVIDIA visual profiler to optimize the application.
|
Page generated in 0.1005 seconds