Object-centric geometric perception aims at extracting the geometric attributes of 3D objects.
These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images.
Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation.
To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective. / Doctor of Philosophy / 3D sensors these days are widely equipped on various mobile devices like a depth camera on iPhone, or laser LiDAR sensors on an autonomous driving vehicle. These 3D sensing techniques could help us get accurate measurements of the 3D world. For the field of machine intel- ligence, we also want to build intelligent system and algorithm to learn useful information and understand the 3D world better.
We human beings have the incredible ability to sense and understand this 3D world through our visual or tactile system. For example, humans could infer the geometry structure and arrangement of furniture in a room without seeing the full room, we are able to track an 3D object no matter how its appearance, shape and scale changes, we could also predict the future motion of multiple objects based on sequential observation and complex reasoning.
Here my work designs various frameworks to learn such 3D information from geometric data represented by a lot of 3D points, which achieves fine-grained geometric understanding of individual objects, and we can help machine tell the target objects' geometry, states, and dynamics.
The work in this dissertation serves as building blocks towards a better understanding of this dynamic world.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/111077 |
Date | 30 June 2022 |
Creators | Li, Xiaolong |
Contributors | Electrical and Computer Engineering, Abbott, A. Lynn, Huang, Jia-Bin, Zhou, Wei, Polys, Nicholas F., Williams, Ryan K. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | Creative Commons Attribution-NonCommercial 4.0 International, http://creativecommons.org/licenses/by-nc/4.0/ |
Page generated in 0.0022 seconds