Return to search

Large-scale learning of discriminative image representations

This thesis addresses the problem of designing discriminative image representations for a variety of computer vision tasks. Our approach is to employ large-scale machine learning to obtain novel representations and improve the existing ones. This allows us to propose descriptors for a variety of applications, such as local feature matching, image retrieval, image classification, and face verification. Our image and region descriptors are discriminative, compact, and achieve state-of-the-art results on challenging benchmarks. Local region descriptors play an important role in image matching and retrieval applications. We train the descriptors using a convex learning framework, which learns the configuration of spatial pooling regions, as well as a discriminative linear projection onto a lower-dimensional subspace. The convexity of the corresponding optimisation problems is achieved by using convex, sparsity-inducing regularisers: the L1 norm and the nuclear (trace) norm. We then extend the descriptor learning framework to the setting, where learning is performed from large image collections, for which the ground-truth feature matches are not available. To tackle this problem, we use the latent variables formulation, which allows us to avoid prefixing correct and incorrect matches based on heuristics. Image recognition systems strongly rely on discriminative image representations to achieve high accuracy. We propose several improvements for the Fisher vector and VLAD image descriptors, showing that better image classification performance can be achieved by using appropriate normalisation and local feature transformation. We then turn to the face image domain, where image descriptors, based on handcrafted facial landmarks, are currently widely employed. Our approach is different: we densely compute local features over face images, and then encode them using the Fisher vector. The latter is then projected onto a learnt low-dimensional subspace, yielding a compact and discriminative face image representation. We also introduce a deep image representation, termed the Fisher network, which can be seen as a hybrid between shallow representations (which it generalises) and deep neural networks. The Fisher network is based on stacking Fisher encodings, which is feasible due to the supervised dimensionality reduction, injected between encodings. Finally, we address the problem of fast medical image search, where we are interested in designing a system, which can be instantly queried by an arbitrary Region of Interest (ROI). To facilitate that, we present a medical image repository representation, based on the pre-computed non-rigid transformations between selected images (exemplars) and all other images. This allows for a fast retrieval of the query ROI, since only a fixed number of registrations to the exemplars should be computed to establish the ROI correspondences in all repository images.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:606390
Date January 2013
CreatorsSimonyan, Karen
ContributorsZisserman, Andrew ; Criminisi, Antonio
PublisherUniversity of Oxford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://ora.ox.ac.uk/objects/uuid:3633f284-0588-4a11-b15b-1520f6a8262a

Page generated in 0.0103 seconds