Return to search

3D Object Detection from Images

Remarkable advancements in the field of Computer Vision, Artificial Intelligence and Machine Learning have led to unprecedented breakthroughs in what
machines are able to achieve. In many tasks such as in Image Classification in fact, they are now capable of even surpassing human performance.
While this is truly outstanding, there are still many tasks in which machines lag far behind. Walking in a room, driving on an highway, grabbing some food
for example. These are all actions that feel natural to us but can be quite unfeasible for them. Such actions require to identify and localize objects in the
environment, effectively building a robust understanding of the scene. Humans easily gain this understanding thanks to their binocular vision, which provides
an high-resolution and continuous stream of information to our brain that efficiently processes it. Unfortunately, things are much different for machines.
With cameras instead of eyes and artificial neural networks instead of a brain, gaining this understanding is still an open problem. In this thesis we will not focus on solving this problem as a whole, but instead delve into a very relevant part of it. We will in fact analyze how to make ma- chines be able to identify and precisely localize objects in the 3D space by relying only on visual input i.e. 3D Object Detection from Images. One of the most complex aspects of Image-based 3D Object Detection is that it inherently requires the solution of many different sub-tasks e.g. the estimation of the object’s distance and its rotation. A first contribution of this thesis is an analysis of how these sub-tasks are usually learned, highlighting a destructivebehavior which limits the overall performance and the proposal of an alternative learning method that avoids it. A second contribution is the discovery of a flaw in the computation of the metric which is widely used in the field, affecting the re-computation of the performance of all published methods and the introduction of a novel un-flawed metric which has now become the official one. A third contribution is focused on one particular sub-task, i.e. estimation of the object’s distance, which is demonstrated to be the most challenging. Thanks to the introduction of a novel approach which normalizes the appearance of objects with respect to their distance, detection performances can be greatly improved. A last contribution of the thesis is the critical analysis of the recently proposed Pseudo-LiDAR methods. Two flaws in their training protocol have been identified and analyzed. On top of this, a novel method able to achieve state-of-the-art in Image-based 3D Object Detection has been developed.

Identiferoai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/353602
Date28 September 2022
CreatorsSimonelli, Andrea
ContributorsSimonelli, Andrea, Ricci, Elisa
PublisherUniversità degli studi di Trento
Source SetsUniversità di Trento
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
Rightsinfo:eu-repo/semantics/openAccess
Relationfirstpage:1, lastpage:122, numberofpages:122

Page generated in 0.0024 seconds