Understanding 3D objects and being able to interact with them in the physical world are essential for building intelligent computer vision systems.
It has tremendous potentials for various applications ranging from augmented reality, 3D printing to robotics.
It might seem simple for human to look and make sense of the visual world, it is however a complicated process for machines to accomplish similar tasks.
Generally, the system is involved with a series of processes: identify and segment a target object, estimate its 3D shape and predict its pose in an open scene where the target objects may have not been seen before.
Although considerable research works have been proposed to tackle these problems, they remain very challenging due to a few key issues:
1) most methods rely solely on color images for interpreting the 3D property of an object; 2) large labeled color images are expensive to get for tasks like pose estimation, limiting the ability to train powerful prediction models; 3) training data for the target object is typically required for 3D shape estimation and pose prediction, making these methods hard to scale and generalize to unseen objects.
Recently, several technological changes have created interesting opportunities for solving these fundamental vision problems.
Low-cost depth sensors become widely available that provides an additional sensory input as a depth map which is very useful for extracting 3D information of the object and scene. On the other hand, with the ease of 3D object scanning with depth sensors and open access to large scale 3D model database like 3D warehouse and ShapeNet, it is possible to leverage such data to build powerful learning models.
Third, machine learning algorithm like deep learning has become powerful that it starts to surpass state-of-the-art or even human performance on challenging tasks like object recognition. It is now feasible to learn rich information from large datasets in a single model.
The objective of this thesis is to leverage such emerging tools and data to solve the above mentioned challenging problems for understanding 3D objects with a new perspective by designing machine learning algorithms utilizing RGB-D data.
Instead of solely depending on color images, we combine both color and depth images to achieve significantly higher performance for object segmentation. We use large collection of 3D object models to provide high quality training data and retrieve visually similar 3D CAD models from low-quality captured depth images which enables knowledge transfer from database objects to target object in an observed scene.
By using content-based 3D shape retrieval, we also significantly improve pose estimation via similar proxy models without the need to create the exact 3D model as a reference.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8FB5FHH |
Date | January 2017 |
Creators | Feng, Jie |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0023 seconds