1 |
A Parallel Network for Compressed Video EnhancementHao, Wei January 2021 (has links)
Recent years, we have witnessed significant progress in the quality enhancement of compressed video by deep learning methods. In this paper, we propose an effective method for Video Quality Enhancement(VQE) task. Our method is realized via \textbf{A Parallel Network for Compressed Video Enhancement(PEN)}. To tackle optical flow estimates and complicated motion, PEN has two branches which are \textbf{Offset Deformable Fusion Network(ODFN)} and \textbf{Complex Motion Solution Network(CMSN)}.
During the alignment stage, existing methods typically estimate optical flow for temporal motion compensation. However, because the compressed video may be severely distorted as a result of various compression artifacts, the estimated optical flow is typically inaccurate and unreliable. Therefore in ODFN we use deformable convolution to align frames in a fast and efficient way.
At the same time, we adopt pyramidal processing and cascading refinement in CMSN which can address complex motions and large parallax problems in alignment. Furthermore, we use the target frame's neighbor Peak Quality frames(PQFs) as reference frames, which adjusts for video quality variations.
Extensive experiments show that our method has improved the average video quality by 0.7 decibel. / Thesis / Master of Applied Science (MASc) / The quality of video is improving as cameras improve, but the size of the video is also increasing. As a result, we will need to compress the video. Video compression, on the other hand, is always accompanied with a loss of video quality.
Deep learning approaches have made tremendous progress in improving the quality of compressed video in recent years.
In this paper, we propose an effective method PEN for Video Quality Enhancement(VQE) task by parallel processing of multiple frames.
|
2 |
Applications of Deep Learning to Video EnhancementShi, Zhihao January 2022 (has links)
Deep learning, usually built upon artificial neural networks, was proposed in 1943, but poor computational capability restricted its development at that time. With the advancement of computer architecture and chip design, deep learning gains sufficient computational power and has revolutionized many areas in computer vision. As a fundamental research area of computer vision, video enhancement often serves as the first step of many modern vision systems and facilitates numerous downstream vision tasks. This thesis provides a comprehensive study of video enhancement, especially in the sense of video frame interpolation and space-time video super-resolution.
For video frame interpolation, two novel methods, named GDConvNet and VFIT, are proposed. In GDConvNet, a novel mechanism named generalized deformable convolution is introduced in order to overcome the inaccuracy flow estimation issue in the flow-based methods and the rigidity issue of kernel shape in the kernel-based methods. This mechanism can effectively learn motion information in a data-driven manner and freely select sampling points in space-time. Our GDConvNet, built upon this mechanism, is shown to achieve state-of-the-art performance. As for VFIT, the concept of local attention is firstly introduced to video interpolation, and a novel space-time separation window-based self-attention scheme is further devised, which not only saves costs but acts as a regularization term to improve the performance.
Based on the new scheme, VFIT is presented as the first Transformer-based video frame interpolation framework. In addition, a multi-scale frame synthesis scheme is developed to fully realize the potential of Transformers. Extensive experiments on a variety of benchmark datasets demonstrate the superiority and liability of VFIT.
For space-time video super-resolution, a novel unconstrained space-time video super-resolution network is proposed to solve the common issues of the existing methods that either fail to explore the intrinsic relationship between temporal and spatial information or lack flexibility in the choice of final temporal/spatial resolution. To this end, several new ideas are introduced, such as integration of multi-level representations and generalized pixshuffle. Various experiments validate the proposed method in terms of its complete freedom in choosing output resolution, as well as superior performance over the state-of-the-art methods. / Thesis / Doctor of Philosophy (PhD)
|
3 |
On the Enhancement of Audio and Video in Mobile EquipmentRossholm, Andreas January 2006 (has links)
Use of mobile equipment has increased exponentially over the last decade. As use becomes more widespread so too does the demand for new functionalities. The limited memory and computational power of many mobile devices has proven to be a challenge resulting in many innovative solutions and a number of new standards. Despite this, there is often a requirement for additional enhancement to improve quality. The focus of this thesis work has been to perform enhancement within two different areas; audio or speech encoding and video encoding/decoding. The audio enhancement section of this thesis addresses the well known problem in the GSM system with an interfering signal generated by the switching nature of TDMA cellular telephony. Two different solutions are given to suppress such interference internally in the mobile handset. The first method involves the use of subtractive noise cancellation employing correlators, the second uses a structure of IIR noth filters. Both solutions use control algorithms based on the state of the communication between the mobile handset and the base station. The video section of this thesis presents two post-filters and one pre-filter. The two post-filters are designed to improve visual quality of highly compressed video streams from standard, block-based video codecs by combating both blocking and ringing artifacts. The second post-filter also performs sharpening. The pre-filter is designed to increase the coding efficiency of a standard block based video codec. By introducing a pre-processing algorithm before the encoder, the amount of camera disturbance and the complexity of the sequence can be decreased, thereby increasing coding efficiency.
|
4 |
A Unified Approach to GPU-Accelerated Aerial Video Enhancement TechniquesCluff, Stephen Thayn 12 February 2009 (has links) (PDF)
Video from aerial surveillance can provide a rich source of data for analysts. From the time-critical perspective of wilderness search and rescue operations, information extracted from aerial videos can mean the difference between a successful search and an unsuccessful search. When using low-cost, payload-limited mini-UAVs, as opposed to more expensive platforms, several challenges arise, including jittery video, narrow fields of view, low resolution, and limited time on screen for key features. These challenges make it difficult for analysts to extract key information in a timely manner. Traditional approaches may address some of these issues, but no existing system effectively addresses all of them in a unified and efficient manner. Building upon a hierarchical dense image correspondence technique, we create a unifying framework for reducing jitter, enhancing resolution, and expanding the field of view while lengthening the time that features remain on screen. It also provides for easy extraction of moving objects in the scene. Our method incorporates locally adaptive warps which allows for robust image alignment even in the presence of parallax and without the aid of internal or external camera parameters. We accelerate the image registration process using commodity Graphics Processing Units (GPUs) to accomplish all of these tasks in near real-time with no external telemetry data.
|
5 |
Machine Learning and Deep Learning Approaches to Print defect Detection, Face Set Recognition, Face Alignment, and Visual Enhancement in Space and TimeXiaoyu Xiang (11166546) 21 July 2021 (has links)
<div>The research includes machine Learning and Deep Learning Approaches to Print Defect Detection, Face Set Recognition and Face Alignment, and Visual-Enhancement in Space and Time. This thesis consists of six parts which are related to 6 projects:</div><div><br></div><div>In Chapter 1, the first project focuses on detection of local printing defects including gray spots and solid spots. We propose a coarse-to-fine method to detect local defects in a block-wise manner and aggregate the blockwise attributes to generate the feature vector of the whole test page for a further ranking task. In the detection part, we first select candidate regions by thresholding a single feature. Then more detailed features of candidate blocks are calculated and sent to a decision tree that is previously trained on our training dataset. The final result is given by the decision tree model to control the false alarm rate while maintaining the required miss rate.</div><div><br></div><div>Chapter 2 introduces face set recognition and Chapter 3 is about face alignment. In order to reduce the computational complexity of comparing face sets, we propose a deep neural network that can compute and aggregate the face feature vectors with different weights. As for face alignment, our goal is to solve the jittering of landmark locations when applied on video. We propose metrics and corresponding methods around this goal.</div><div><br></div><div>In recent years, mobile photography has become increasingly prevalent in our lives with social media due to its high portability and convenience. However, many challenges still exist in distributing high-quality mobile images and videos under the limit of data capacity, hardware storage, and network bandwidth. Therefore, we have been exploring enhancement techniques to improve the image and video qualities, considering both effectiveness and efficiency for a wide variety of applications, including WhatsApp, Portal, TikTok, even the printing industry. Chapter 4 introduces single image super-resolution to handle real-world images with various degradations, and its influence on several downstream high-level computer vision tasks. Next, Chapter 5 studies on headshot image restoration with multiple references, which is an application of visual enhancement under more specific scenarios. Finally, as a step towards the temporal domain enhancement, the Zooming SlowMo framework for fast and accurate space-time video super-resolution will be introduced in Chapter 6.</div>
|
Page generated in 0.1017 seconds