Thanks to the major advancements in hardware and computational power, sensor technology, and artificial intelligence, the race for fully autonomous driving systems is heating up. With a countless number of challenging conditions and driving
scenarios, researchers are tackling the most challenging problems in driverless cars.
One of the most critical components is the perception module, which enables an autonomous vehicle to "see" and "understand" its surrounding environment. Given
that modern vehicles can have large number of sensors and available data streams,
this thesis presents a deep learning-based framework that leverages multimodal
data – i.e. sensor fusion, to perform the task of 3D object detection and localization.
We provide an extensive review of the advancements of deep learning-based
methods in computer vision, specifically in 2D and 3D object detection tasks. We also
study the progress of the literature in both single-sensor and multi-sensor data fusion techniques. Furthermore, we present an in-depth explanation of our proposed
approach that performs sensor fusion using input streams from LiDAR and Camera
sensors, aiming to simultaneously perform 2D, 3D, and Bird’s Eye View detection.
Our experiments highlight the importance of learnable data fusion mechanisms and
multi-task learning, the impact of different CNN design decisions, speed-accuracy
tradeoffs, and ways to deal with overfitting in multi-sensor data fusion frameworks.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/42812 |
Date | 14 October 2021 |
Creators | Massoud, Yahya |
Contributors | Laganière, Robert |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Attribution-ShareAlike 4.0 International, http://creativecommons.org/licenses/by-sa/4.0/ |
Page generated in 0.0019 seconds