Return to search

Adapting Single-View View Synthesis with Multiplane Images for 3D Video Chat

Activities like one-on-one video chatting and video conferencing with multiple participants are more prevalent than ever today as we continue to tackle the pandemic. Bringing a 3D feel to video chat has always been a hot topic in Vision and Graphics communities. In this thesis, we have employed novel view synthesis in attempting to turn one-on-one video chatting into 3D. We have tuned the learning pipeline of Tucker and Snavely's single-view view synthesis paper — by retraining it on MannequinChallenge dataset — to better predict a layered representation of the scene viewed by either video chat participant at any given time. This intermediate representation of the local light field — called a Multiplane Image (MPI) — may then be used to rerender the scene at an arbitrary viewpoint which, in our case, would match with the head pose of the watcher in the opposite, concurrent video frame. We discuss that our pipeline, when implemented in real-time, would allow both video chat participants to unravel occluded scene content and "peer into" each other's dynamic video scenes to a certain extent. It would enable full parallax up to the baselines of small head rotations and/or translations. It would be similar to a VR headset's ability to determine the position and orientation of the wearer's head in 3D space and render any scene in alignment with this estimated head pose. We have attempted to improve the performance of the retrained model by extending MannequinChallenge with the much larger RealEstate10K dataset. We present a quantitative and qualitative comparison of the model variants and describe our impactful dataset curation process, among other aspects.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-3987
Date01 December 2021
CreatorsUppuluri, Anurag Venkata
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0118 seconds