Return to search

Learning to handle occlusion for motion analysis and view synthesis

The ability to understand occlusion and disocclusion is critical in analyzing motion and forecasting changes. For example, when we see a car gradually blocks our view of a human figure, we know that either the car or the human is moving. We also know that the human behind the car will be visible again if we move to other positions. As many vision-based intelligent systems need to handle and react to visual data with potentially intensive motions, it is therefore beneficial to incorporate the occlusion reasoning into such systems. In this thesis, we study how we can improve the performance of vision-based deep learning models by harnessing the power of occlusion handling. We first visit the problem of optical flow estimation for motion analysis. We present a deep learning module that builds upon occlusion handling methods in classic Computer Vision literature. Our results show performance improvement in occluded regions on standard benchmarks, as well as real-world applications. We then examine the problem of view synthesis for 3D photography. We propose an inpainting method that leverages local color and depth context for novel view synthesis. We validate the proposed inpainting approach with a series of quantitative and qualitative experiments, and demonstrate promising results in predicting plausible content in occluded regions. / Master of Science / Human has the ability to understand occlusion, and make use of such knowledge to make predictions about motions and occluded contents. For example, when we see a car gradually blocks our view of a human figure, we know that either the car or the human is moving. We also know that the human behind the car will be visible again if we move to other positions. In this thesis, we study how we can replicate such an ability to artificial intelligence systems. We first investigate the effect of occlusion reasoning in the task of predicting motion. Our experimental results show that a system equipped with our occlusion reasoning module can better capture the motions happening in image sequences. Next, we examine the problem of hallucinating visual contents that are blocked in an image. We develop a model that can produce plausible content in occluded regions. In our experiments, we show that given one single RGB image with an estimated depth map, our model can produce a corresponding 3D photo by hallucinating the structures that are not visible in the image.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/98620
Date29 May 2020
CreatorsSu, Shih-Yang
ContributorsElectrical and Computer Engineering, Huang, Jia-Bin, Stilwell, Daniel J., Tokekar, Pratap
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0033 seconds