We address the problem of estimating a high quality dense depth map from a
single RGB input image. We start out with a baseline encoder-decoder convolutional
neural network architecture and pose the question of how the global processing of
information can help improve overall depth estimation. To this end, we propose a
transformer-based architecture block that divides the depth range into bins whose
center value is estimated adaptively per image. The final depth values are estimated
as linear combinations of the bin centers. We call our new building block AdaBins.
Our results show a decisive improvement over the state-of-the-art on several popular
depth datasets across all metrics. We also validate the effectiveness of the proposed
block with an ablation study.
Identifer | oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/668894 |
Date | 21 April 2021 |
Creators | Bhat, Shariq |
Contributors | Wonka, Peter, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Hadwiger, Markus, Ghanem, Bernard |
Source Sets | King Abdullah University of Science and Technology |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0017 seconds