The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time. Such a system has a wide variety of applications including object or location recognition, video search, near duplicate detection and 3D reconstruction. The task is very challenging because of large variations in the imaged object appearance due to changes in lighting conditions, scale and viewpoint, as well as partial occlusions. A starting point of established systems which tackle the same task is detection of viewpoint invariant features, which are then quantized into visual words and efficient retrieval is performed using an inverted index. We make the following three improvements to the standard framework: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel discriminative method for query expansion; (iii) a new feature augmentation method. Scaling up to searching millions of images involves either distributing storage and computation across many computers, or employing very compact image representations on a single computer combined with memory-efficient approximate nearest neighbour search (ANN). We take the latter approach and improve VLAD, a popular compact image descriptor, using: (i) a new normalization method to alleviate the burstiness effect; (ii) vocabulary adaptation to reduce influence of using a bad visual vocabulary; (iii) extraction of multiple VLADs for retrieval and localization of small objects. We also propose a method, SCT, for extremely low bit-rate compression of descriptor sets in order to reduce the memory footprint of ANN. The problem of finding images of an object in an unannotated image corpus starting from a textual query is also considered. Our approach is to first obtain multiple images of the queried object using textual Google image search, and then use these images to visually query the target database. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of the queried object. Current retrieval techniques work only for objects which have a light coating of texture, while failing completely for smooth (fairly textureless) objects best described by shape. We present a scalable approach to smooth object retrieval and illustrate it on sculptures. A smooth object is represented by its imaged shape using a set of quantized semi-local boundary descriptors (a bag-of-boundaries); the representation is suited to the standard visual word based object retrieval. Furthermore, we describe a method for automatically determining the title and sculptor of an imaged sculpture using the proposed smooth object retrieval system.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:640168 |
Date | January 2013 |
Creators | Arandjelovic, Relja |
Contributors | Zisserman, Andrew |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:619dc397-b645-494b-a014-8e9f51f6884f |
Page generated in 0.002 seconds