Learning to perceive scenes and objects from 2D images as 3D models is atrivial task for a human but very challenging for a computer. Being ableto retrieve a 3D model from a scene just by taking a picture of it canbe of great use in many fields, for example when making 3D blueprintsfor buildings or working with animations in the game or film industry.Novel view synthesis is a field within deep learning where generativemodels are trained to construct 3D models of scenes or objects from 2Dimages. In this work, the generative model HoloGAN is combined together with aU-net segmentation network. The solution is able to, given an imagecontaining a single object as input, swap that object to another oneand then perform a rotation of the scene, generating new images fromunobserved view points. The segmentation network is trained with pairedsegmentation masks while HoloGAN is able to in an unsupervised mannerlearn 3D metrics of scenes from unlabeled 2D images. The system as awhole is trained on one dataset containing images of cars while theperformance of HoloGAN was evaluated on four additionaldatasets. The chosen method proved to be successful but came with somedrawbacks such as requiring large dataset sizes and being computationalexpensive to train.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-435503 |
Date | January 2021 |
Creators | Bodin, Emanuel |
Publisher | Uppsala universitet, Signaler och system |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC F, 1401-5757 ; 21004 |
Page generated in 0.9231 seconds