With the advancement of autonomous driving research, 3D detection based on LiDAR point cloud has gradually become one of the top research topics in the field of artificial intelligence. Compared with RGB cameras, LiDAR point cloud can provide depth information, while RGB images can provide denser resolution. Features from LiDAR and cameras are considered to be complementary. However, due to the sparsity of the LiDAR point clouds, a dense and accurate RGB/3D projective relationship is difficult to establish especially for distant scene points. Recent works try to solve this problem by designing a network that learns missing points or dense point density distribution to compensate for the sparsity of the LiDAR point cloud. During the master’s exploration, we consider addressing this problem from two aspects. The first is to design a GAN(Generative Adversarial Network)-based module to reconstruct point clouds, and the second is to apply regional point cloud enhancement based on motion maps. In the first aspect, we propose to use an imagine-and-locate process, called UYI. The objective of this module is to improve the point cloud quality and is independent of the detection stage used for inference. We accomplish this task through a GAN-based cross-modality module that uses image as input to infer a dense LiDAR shape. In another aspect, inspired by the attention mechanism of human eyes, we use motion maps to perform random augmentation on point clouds in a targeted manner named motion map-assisted enhancement, MAE. Boosted by our UYI and MAE module, our experiments show a significant performance improvement in all tested baseline models. In fact, benefiting from the plug-and-play characteristics of our module, we were able to push the performance of the existing state-of-the-art model to a new height. Our method not only has made great progress in the detection performance of vehicle objects but also achieved an even bigger leap forward in the pedestrian category. In future research, we will continue to explore the feasibility of spatio-temporal correlation methods in 3D detection, and 3D detection related to motion information extraction could be a promising direction.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/44998 |
Date | 25 May 2023 |
Creators | Zhang, Zeping |
Contributors | Laganière, Robert |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Attribution-NonCommercial-ShareAlike 4.0 International, http://creativecommons.org/licenses/by-nc-sa/4.0/ |
Page generated in 0.0023 seconds