Return to search

Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion

This thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving. / Master of Science / In the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/120894
Date08 August 2024
CreatorsWale, Prajakta Nitin
ContributorsMechanical Engineering, Huxtable, Scott T., Taheri, Saied, Ahmadian, Mehdi
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0018 seconds