This thesis addresses the problem of visual object retrieval, where a user formulates a query to an image database by providing one or multiple examples of an object of interest. The presented techniques aim both at finding those images in the database that contain the object as well as locating the object in the image and segmenting it from the background.
Every considered image, both the ones used as queries and the ones contained in the target database, is represented as a Binary Partition Tree (BPT), the hierarchy of regions previously proposed by Salembier and Garrido (2000). This data structure offers multiple opportunities and challenges when applied to the object retrieval problem.
A first application of BPTs appears during the formulation of the query, when the user must interactively segment the query object from the background. Firstly, the BPT can assist in adjusting an initial marker, such as a scribble or bounding box, to the object contours. Secondly, BPT can also define a navigation path for the user to adjust an initial selection to the appropriate spatial scale.
The hierarchical structure of the BPT is also exploited to extract a new type of visual words named Hierarchical Bag of Regions (HBoR). Each region defined in the BPT is described with a feature vector that combines a soft quantization on a visual codebook with an efficient bottom-up computation through the BPT. These descriptors allow the definition of a novel feature space, the Parts Space, where each object is located according to the parts that compose it.
HBoR descriptors have been applied to two scenarios for object retrieval, both of them solved by considering the decomposition of the objects in parts. In the first scenario, the query is formulated with a single object exemplar which is to be matched with each BPT in the target database. The matching problem is solved in two stages: an initial top-down one that assumes that the hierarchy from the query is respected in the target BPT, and a second bottom-up one that relaxes this condition and considers region merges which are not in the target BPT.
The second scenario where HBoR descriptors are applied considers a query composed of several visual objects. In this case, the provided exemplars are considered as a training set to build a model of the query concept. This model is composed of two levels, a first one where each part is modelled and detected separately, and a second one that characterises the combinations of parts that describe the complete object. The analysis process exploits the hierarchical nature of the BPT by using a novel classifier that drives an efficient top-down analysis of the target BPTs.
Identifer | oai:union.ndltd.org:TDX_UPC/oai:www.tdx.cat:10803/108909 |
Date | 31 May 2012 |
Creators | Giró i Nieto, Xavier |
Contributors | Chang, Shih-Fu, Marqués Acosta, Fernando, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
Publisher | Universitat Politècnica de Catalunya |
Source Sets | Universitat Politècnica de Catalunya |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/doctoralThesis, info:eu-repo/semantics/publishedVersion |
Format | 211 p., application/pdf |
Source | TDX (Tesis Doctorals en Xarxa) |
Rights | info:eu-repo/semantics/openAccess, L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/3.0/es/ |
Page generated in 0.0017 seconds