Global ETD Search

Return to search

Deep Learning based Video Super- Resolution in Computer Generated Graphics / Deep Learning-baserad video superupplösning för datorgenererad grafik

Super-Resolution is a widely studied problem in the field of computer vision, where the purpose is to increase the resolution of, or super-resolve, image data. In Video Super-Resolution, maintaining temporal coherence for consecutive video frames requires fusing information from multiple frames to super-resolve one frame. Current deep learning methods perform video super-resolution, yet most of them focus on working with natural datasets. In this thesis, we use a recurrent back-projection network for working with a dataset of computer-generated graphics, with example applications including upsampling low-resolution cinematics for the gaming industry. The dataset comes from a variety of gaming content, rendered in (3840 x 2160) resolution. The objective of the network is to produce the upscaled version of the low-resolution frame by learning an input combination of a low-resolution frame, a sequence of neighboring frames, and the optical flow between each neighboring frame and the reference frame. Under the baseline setup, we train the model to perform 2x upsampling from (1920 x 1080) to (3840 x 2160) resolution. In comparison against the bicubic interpolation method, our model achieved better results by a margin of 2dB for Peak Signal-to-Noise Ratio (PSNR), 0.015 for Structural Similarity Index Measure (SSIM), and 9.3 for the Video Multi-method Assessment Fusion (VMAF) metric. In addition, we further demonstrate the susceptibility in the performance of neural networks to changes in image compression quality, and the inefficiency of distortion metrics to capture the perceptual details accurately. / Superupplösning är ett allmänt studerat problem inom datorsyn, där syftet är att öka upplösningen på eller superupplösningsbilddata. I Video Super- Resolution kräver upprätthållande av tidsmässig koherens för på varandra följande videobilder sammanslagning av information från flera bilder för att superlösa en bildruta. Nuvarande djupinlärningsmetoder utför superupplösning i video, men de flesta av dem fokuserar på att arbeta med naturliga datamängder. I denna avhandling använder vi ett återkommande bakprojektionsnätverk för att arbeta med en datamängd av datorgenererad grafik, med exempelvis applikationer inklusive upsampling av film med låg upplösning för spelindustrin. Datauppsättningen kommer från en mängd olika spelinnehåll, återgivna i (3840 x 2160) upplösning. Målet med nätverket är att producera en uppskalad version av en ram med låg upplösning genom att lära sig en ingångskombination av en lågupplösningsram, en sekvens av intilliggande ramar och det optiska flödet mellan varje intilliggande ram och referensramen. Under grundinställningen tränar vi modellen för att utföra 2x uppsampling från (1920 x 1080) till (3840 x 2160) upplösning. Jämfört med den bicubiska interpoleringsmetoden uppnådde vår modell bättre resultat med en marginal på 2 dB för Peak Signal-to-Noise Ratio (PSNR), 0,015 för Structural Similarity Index Measure (SSIM) och 9.3 för Video Multimethod Assessment Fusion (VMAF) mätvärde. Dessutom demonstrerar vi vidare känsligheten i neuronal nätverk för förändringar i bildkomprimeringskvaliteten och ineffektiviteten hos distorsionsmätvärden för att fånga de perceptuella detaljerna exakt.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292687

Deep Learning

Convolutional Neural Networks

Video Super-Resolution

Computer Generated Graphics

Gaming.

Deep Learning

Convolutional Neural Networks

Video Super-Resolution

Computer Generated Graphics

Gaming.

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-292687
Date	January 2020
Creators	Jain, Vinit
Publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2020:932

Page generated in 0.0101 seconds

Deep Learning based Video Super- Resolution in Computer Generated Graphics / Deep Learning-baserad video superupplösning för datorgenererad grafik

Description

Links & Downloads

Tags

Additional Fields