Spelling suggestions: "subject:"video superresolution"" "subject:"video superrésolution""
1 |
Applications of Deep Learning to Visual Content Processing and AnalysisLiu, Xiaohong January 2021 (has links)
The advancement of computer architecture and chip design has set the stage for the deep learning revolution by supplying enormous computational power. In general, deep learning is built upon neural networks that can be regarded as a universal approximator of any mathematical function. In contrast to model-based machine learning where the representative features are designed by human engineers, deep learning enables the automatic discovery of desirable feature representations based on a data-driven manner. In this thesis, the applications of deep learning to visual content processing and analysis are discussed.
For visual content processing, two novel approaches, named LCVSR and RawVSR, are proposed to address the common issues in the filed of Video Super-Resolution (VSR). In LCVSR, a new mechanism based on local dynamic filters via Locally Connected (LC) layers is proposed to implicitly estimate and compensate motions. It avoids the errors caused by the inaccurate explicit estimation of flow maps. Moreover, a global refinement network is proposed to exploit non-local correlations and enhance the spatial consistency of super-resolved frames. In RawVSR, the superiority of camera raw data (where the primitive radiance information is recorded) is harnessed to benefit the reconstruction of High-Resolution (HR) frames. The developed network is in line with the real imaging pipeline, where the super-resolution process serves as a pre-processing unit of ISP. Moreover, a Successive Deep Inference (SDI) module is designed in accordance with the architectural principle suggested by a canonical decomposition result for Hidden Markov Model (HMM) inference, and a reconstruction module is built with elaborately designed Attention based Residual Dense Blocks (ARDBs).
For visual content analysis, a new approach, named PSCC-Net, is proposed to detect and localize image manipulations. It consists of two paths: a top-down path that extracts the local and global features from an input image, and a bottom-up path that first distinguishes manipulated images from pristine ones via a detection head, and then localizes forged regions via a progressive mechanism, where manipulation masks are estimated from small scales to large ones, each serving as a prior of the next-scale estimation. Moreover, a Spatio-Channel Correlation Module (SCCM) is proposed to capture both spatial and channel-wise correlations among extracted features, enabling the network to cope with a wide range of manipulation attacks.
Extensive experiments validate that the proposed methods in this thesis have achieved the SOTA results and partially addressed the existing issues in previous works. / Dissertation / Doctor of Philosophy (PhD)
|
2 |
Model-based Regularization for Video Super-ResolutionWang, Huazhong 04 1900 (has links)
In this thesis, we reexamine the classical problem of video super-resolution, with an aim to reproduce fine edge/texture details of acquired digital videos. In general, the video super-resolution reconstruction is an ill-posed inverse problem, because of an insufficient number of observations from registered low-resolution video frames. To stabilize the problem and make its solution more accurate, we develop two video super-resolution techniques: 1) a 2D autoregressive modeling and interpolation technique for video super-resolution reconstruction, with model parameters estimated from multiple registered low-resolution frames; 2) the use of image model as a regularization term to improve the performance of the traditional video super-resolution algorithm. We further investigate the interactions of various unknown variables involved in video super-resolution reconstruction, including motion parameters, high-resolution pixel intensities and the parameters of the image model used for regularization. We succeed in developing a joint estimation technique that infers these unknowns simultaneously to achieve statistical consistency among them. / Thesis / Master of Applied Science (MASc)
|
3 |
Wide Activated Separate 3D Convolution for Video Super-ResolutionYu, Xiafei 18 December 2019 (has links)
Video super-resolution (VSR) aims to recover a realistic high-resolution (HR) frame
from its corresponding center low-resolution (LR) frame and several neighbouring supporting frames. The neighbouring supporting LR frames can provide extra information to help recover the HR frame. However, these frames are not aligned with the center frame due to the motion of objects. Recently, many video super-resolution methods based on deep learning have been proposed with the rapid development of neural networks. Most of these methods utilize motion estimation and compensation models as preprocessing to handle spatio-temporal alignment problem. Therefore, the accuracy of these motion estimation models are critical for predicting the high-resolution frames. Inaccurate results of motion compensation models will lead to artifacts and blurs, which also will damage the recovery of high-resolution frames. We propose an effective wide activated separate 3 dimensional (3D) Convolution Neural Network (CNN) for video super-resolution to overcome the drawback of utilizing motion compensation models. Separate 3D convolution factorizes the 3D convolution into convolutions in the spatial and temporal domain, which have benefit for the optimization of spatial and temporal convolution components. Therefore, our method can capture temporal and spatial information of input frames simultaneously without additional motion evaluation and compensation model. Moreover, the experimental results demonstrated the effectiveness of the proposed wide activated separate 3D CNN.
|
4 |
Multi-Kernel Deformable 3D Convolution for Video Super-ResolutionDou, Tianyu 17 September 2021 (has links)
Video super-resolution (VSR) methods align and fuse consecutive low-resolution frames to generate high-resolution frames. One of the main difficulties for the VSR process is that video contains various motions, and the accuracy of motion estimation dramatically affects the quality of video restoration. However, standard CNNs share the same receptive field in each layer, and it is challenging to estimate diverse motions effectively. Neuroscience research has shown that the receptive fields of biological visual areas will be adjusted according to the input information. Diverse receptive fields in temporal and spatial dimensions have the potential to adapt to various motions, which is rarely paid attention in most known VSR methods.
In this thesis, we propose to provide adaptive receptive fields for the VSR model. Firstly, we design a multi-kernel 3D convolution network and integrate it with a multi-kernel deformable convolution network for motion estimation and multiple frames alignment. Secondly, we propose a 2D multi-kernel convolution framework to improve texture restoration quality. Our experimental results show that the proposed framework outperforms the state-of-the-art VSR methods.
|
5 |
Key-Frame Based Video Super-Resolution for Hybrid CamerasLengyel, Robert 11 1900 (has links)
This work focuses on the high frequency restoration of video sequences captured by a hybrid camera, using key-frames as high frequency samples. The proposed method outlines a hierarchy to the super-resolution process, and is aimed at maximizing both speed and performance. Additionally, an advanced image processing simulator (EngineX) was developed to fine tune the algorithm. / Super-resolution algorithms are designed to enhance the detail level of a
particular image or video sequence. However, it is very difficult to achieve in
practice due to the problem being ill-posed, and often requires regularization
based on assumptions about texture or edges. The process can be aided using
high-resolution key-frames such as those generated from a hybrid camera. A
hybrid camera is capable of capturing footage in multiple spatial and temporal
resolutions. The typical output consists of a high resolution stream captured at
low frame rate, and a low resolution stream captured at a high frame rate.
Key-frame based super-resolution algorithms exploit the spatial and temporal
correlation between the high resolution and low resolution streams to
reconstruct a high resolution and high frame rate output stream.
The proposed algorithm outlines a hierarchy to the super-resolution process,
combining several different classical and novel methods. A residue formulation
decides which pixels are required to be further reconstructed if a particular
hierarchy stage fails to provide the expected results when compared to the low
resolution prior. The hierarchy includes the optical flow based estimation which
warps high frequency information from adjacent key-frames to the current frame.
Specialized candidate pixel selection reduces the total number of pixels
considered in the NLM stage. Occlusion is handled by a final fallback stage in
the hierarchy. Additionally, the running time for a CIF sequence of 30 frames
has been significantly reduced to within 3 minutes by identifying which pixels
require reconstruction with a particular method.
A custom simulation environment implements the proposed method as well as many
common image processing algorithms. EngineX provides a graphical interface where
video sequences and image processing methods can be manipulated and combined.
The framework allows for advanced features such as multithreading, parameter
sweeping, and block level abstraction which aided the development of the
proposed super-resolution algorithm. Both speed and performance were fine tuned
using the simulator which is the key to its improved quality over other traditional
super-resolution schemes. / Thesis / Master of Applied Science (MASc)
|
6 |
Deep Learning based Video Super- Resolution in Computer Generated Graphics / Deep Learning-baserad video superupplösning för datorgenererad grafikJain, Vinit January 2020 (has links)
Super-Resolution is a widely studied problem in the field of computer vision, where the purpose is to increase the resolution of, or super-resolve, image data. In Video Super-Resolution, maintaining temporal coherence for consecutive video frames requires fusing information from multiple frames to super-resolve one frame. Current deep learning methods perform video super-resolution, yet most of them focus on working with natural datasets. In this thesis, we use a recurrent back-projection network for working with a dataset of computer-generated graphics, with example applications including upsampling low-resolution cinematics for the gaming industry. The dataset comes from a variety of gaming content, rendered in (3840 x 2160) resolution. The objective of the network is to produce the upscaled version of the low-resolution frame by learning an input combination of a low-resolution frame, a sequence of neighboring frames, and the optical flow between each neighboring frame and the reference frame. Under the baseline setup, we train the model to perform 2x upsampling from (1920 x 1080) to (3840 x 2160) resolution. In comparison against the bicubic interpolation method, our model achieved better results by a margin of 2dB for Peak Signal-to-Noise Ratio (PSNR), 0.015 for Structural Similarity Index Measure (SSIM), and 9.3 for the Video Multi-method Assessment Fusion (VMAF) metric. In addition, we further demonstrate the susceptibility in the performance of neural networks to changes in image compression quality, and the inefficiency of distortion metrics to capture the perceptual details accurately. / Superupplösning är ett allmänt studerat problem inom datorsyn, där syftet är att öka upplösningen på eller superupplösningsbilddata. I Video Super- Resolution kräver upprätthållande av tidsmässig koherens för på varandra följande videobilder sammanslagning av information från flera bilder för att superlösa en bildruta. Nuvarande djupinlärningsmetoder utför superupplösning i video, men de flesta av dem fokuserar på att arbeta med naturliga datamängder. I denna avhandling använder vi ett återkommande bakprojektionsnätverk för att arbeta med en datamängd av datorgenererad grafik, med exempelvis applikationer inklusive upsampling av film med låg upplösning för spelindustrin. Datauppsättningen kommer från en mängd olika spelinnehåll, återgivna i (3840 x 2160) upplösning. Målet med nätverket är att producera en uppskalad version av en ram med låg upplösning genom att lära sig en ingångskombination av en lågupplösningsram, en sekvens av intilliggande ramar och det optiska flödet mellan varje intilliggande ram och referensramen. Under grundinställningen tränar vi modellen för att utföra 2x uppsampling från (1920 x 1080) till (3840 x 2160) upplösning. Jämfört med den bicubiska interpoleringsmetoden uppnådde vår modell bättre resultat med en marginal på 2 dB för Peak Signal-to-Noise Ratio (PSNR), 0,015 för Structural Similarity Index Measure (SSIM) och 9.3 för Video Multimethod Assessment Fusion (VMAF) mätvärde. Dessutom demonstrerar vi vidare känsligheten i neuronal nätverk för förändringar i bildkomprimeringskvaliteten och ineffektiviteten hos distorsionsmätvärden för att fånga de perceptuella detaljerna exakt.
|
7 |
ADVANCES IN MACHINE LEARNING METHODOLOGIES FOR BUSINESS ANALYTICS, VIDEO SUPER-RESOLUTION, AND DOCUMENT CLASSIFICATIONTianqi Wang (18431280) 26 April 2024 (has links)
<p dir="ltr">This dissertation encompasses three studies in distinct yet impactful domains: B2B marketing, real-time video super-resolution (VSR), and smart office document routing systems. In the B2B marketing sphere, the study addresses the extended buying cycle by developing an algorithm for customer data aggregation and employing a CatBoost model to predict potential purchases with 91% accuracy. This approach enables the identification of high-potential<br>customers for targeted marketing campaigns, crucial for optimizing marketing efforts.<br>Transitioning to multimedia enhancement, the dissertation presents a lightweight recurrent network for real-time VSR. Developed for applications requiring high-quality video with low latency, such as video conferencing and media playback, this model integrates an optical flow estimation network for motion compensation and leverages a hidden space for the propagation of long-term information. The model demonstrates high efficiency in VSR. A<br>comparative analysis of motion estimation techniques underscores the importance of minimizing information loss.<br>The evolution towards smart office environments underscores the importance of an efficient document routing system, conceptualized as an online class-incremental image classification challenge. This research introduces a one-versus-rest parametric classifier, complemented by two updating algorithms based on passive-aggressiveness, and adaptive thresholding methods to manage low-confidence predictions. Tested on 710 labeled real document<br>images, the method reports a cumulative accuracy rate of approximately 97%, showcasing the effectiveness of the chosen aggressiveness parameter through various experiments.</p>
|
Page generated in 0.097 seconds