Return to search

Applications of Deep Learning to Visual Content Processing and Analysis

The advancement of computer architecture and chip design has set the stage for the deep learning revolution by supplying enormous computational power. In general, deep learning is built upon neural networks that can be regarded as a universal approximator of any mathematical function. In contrast to model-based machine learning where the representative features are designed by human engineers, deep learning enables the automatic discovery of desirable feature representations based on a data-driven manner. In this thesis, the applications of deep learning to visual content processing and analysis are discussed.

For visual content processing, two novel approaches, named LCVSR and RawVSR, are proposed to address the common issues in the filed of Video Super-Resolution (VSR). In LCVSR, a new mechanism based on local dynamic filters via Locally Connected (LC) layers is proposed to implicitly estimate and compensate motions. It avoids the errors caused by the inaccurate explicit estimation of flow maps. Moreover, a global refinement network is proposed to exploit non-local correlations and enhance the spatial consistency of super-resolved frames. In RawVSR, the superiority of camera raw data (where the primitive radiance information is recorded) is harnessed to benefit the reconstruction of High-Resolution (HR) frames. The developed network is in line with the real imaging pipeline, where the super-resolution process serves as a pre-processing unit of ISP. Moreover, a Successive Deep Inference (SDI) module is designed in accordance with the architectural principle suggested by a canonical decomposition result for Hidden Markov Model (HMM) inference, and a reconstruction module is built with elaborately designed Attention based Residual Dense Blocks (ARDBs).

For visual content analysis, a new approach, named PSCC-Net, is proposed to detect and localize image manipulations. It consists of two paths: a top-down path that extracts the local and global features from an input image, and a bottom-up path that first distinguishes manipulated images from pristine ones via a detection head, and then localizes forged regions via a progressive mechanism, where manipulation masks are estimated from small scales to large ones, each serving as a prior of the next-scale estimation. Moreover, a Spatio-Channel Correlation Module (SCCM) is proposed to capture both spatial and channel-wise correlations among extracted features, enabling the network to cope with a wide range of manipulation attacks.

Extensive experiments validate that the proposed methods in this thesis have achieved the SOTA results and partially addressed the existing issues in previous works. / Dissertation / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/26815
Date January 2021
CreatorsLiu, Xiaohong
ContributorsChen, Jun, Electrical and Computer Engineering
Source SetsMcMaster University
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0027 seconds