A fundamental challenge in deploying vision-based object detection on a robotic platform is achieving sufficient perceptual performance for safe and effective operation. While general purpose object detectors have steadily improved in performance, we are still some way from being able to rely solely on purely vision-based robotic perception systems. This thesis seeks to resolve this problem by exploiting a characteristic of computer vision unique to robotic applications, with a particular focus on pedestrian detection for autonomous driving. Robots - even mobile ones - typically have some inherent structure to the visual data they process. Firstly, they operate in the same environment (for example, regular commuting or errands with autonomous cars), and secondly, they are able to localise within that environment. Place matters in robotics, as we can exploit this additional context to boost perceptual performance by constructing object detector models fitted to a mobile robot's place of operation. We demonstrate that, in an ideal scenario with ground truth labels, we can significantly improve the detection performance of lightweight object detectors by exploiting place. Our results suggest that a key factor limiting detection performance is the model capacity, suggesting that this approach could equally be applied to higher capacity models as the computational budget dictates. This local expert detector is developed into a deployable, self-supervised system, using offline image segmentation and spatial heuristics to construct the detector models and a visual localisation system to retrieve them at run time. This approach boosts the perceptual performance of our lightweight object detector models. Finally, this ensemble approach to local expert object detection is extended further with a neural network trained to generate detector models conditioned on the input image, an approach we refer to as a Dynamic Detection Filter Network. The network learns a representation of the operating environment, generating model parameters based on the input image. This offers a general approach to constructing place specific object detectors independently of localisation, with the potential to operate on larger scales with many different environments.
|Newman, Paul ; Posner, Ingmar
|University of Oxford
|Electronic Thesis or Dissertation
Page generated in 0.0018 seconds