Estimating depth and detection of object instances in 3D space is fundamental in autonomous navigation, localization, and mapping, robotic object manipulation, and
augmented reality. RGB-D images and LiDAR point clouds are the most illustrative formats of depth information. However, depth sensors offer many shortcomings,
such as low effective spatial resolutions and capturing of a scene from a single perspective.
The thesis focuses on reproducing denser and comprehensive 3D scene structure for given monocular RGB images using depth and 3D object detection.
The first contribution of this thesis is the pipeline for the depth estimation based on an unsupervised learning framework. This thesis proposes two architectures to
analyze structure from motion and 3D geometric constraint methods. The proposed architectures trained and evaluated using only RGB images and no ground truth
depth data. The architecture proposed in this thesis achieved better results than the state-of-the-art methods.
The second contribution of this thesis is the application of the estimated depth map, which includes two algorithms: point cloud generation and collision avoidance.
The predicted depth map and RGB image are used to generate the point cloud data using the proposed point cloud algorithm. The collision avoidance algorithm predicts
the possibility of collision and provides the collision warning message based on decoding the color in the estimated depth map. This algorithm design is adaptable
to different color map with slight changes and perceives collision information in the sequence of frames.
Our third contribution is a two-stage pipeline to detect the 3D objects from a monocular image. The first stage pipeline used to detect the 2D objects and crop
the patch of the image and the same provided as the input to the second stage. In the second stage, the 3D regression network train to estimate the 3D bounding boxes
to the target objects. There are two architectures proposed for this 3D regression network model. This approach achieves better average precision than state-of-theart
for truncation of 15% or fully visible objects and lowers but comparable results for truncation more than 30% or partly/fully occluded objects.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:77544 |
Date | 25 January 2022 |
Creators | Manoharan, Shanmugapriyan |
Contributors | Hardt, Wolfram, Saleh, Shadi, Technische Universität Chemnitz |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:masterThesis, info:eu-repo/semantics/masterThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds