Spelling suggestions: "subject:"abject arecognition anda retrieval"" "subject:"abject arecognition anda etrieval""
1 |
Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and RetrievalRohan Sarkar (19065215) 11 July 2024 (has links)
<p dir="ltr">Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves
recognizing objects and retrieving similar object images through visual queries. While
deep metric learning is commonly employed to learn image embeddings for solving such
problems, the representations learned using existing methods are not robust to changes in
viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks.
To overcome these limitations, this dissertation aims to learn robust object representations
that remain invariant to such transformations for fine-grained tasks. First, it focuses on
learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the
category and finer object-identity levels by learning category and object-identity specific representations
in separate embedding spaces simultaneously. For this, the PiRO framework is
introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant
ranking losses for each embedding space to disentangle the category and object representations
while learning pose-invariant features. Second, the dissertation introduces ranking
losses that cluster multi-view images of an object together in both the embedding spaces
while simultaneously pulling the embeddings of two objects from the same category closer in
the category embedding space to learn fundamental category-specific attributes and pushing
them apart in the object embedding space to learn discriminative features to distinguish
between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange
dataset to facilitate research in recognizing fine-grained objects with
state changes involving structural transformations in addition to pose and viewpoint changes.
Fourth, it proposes a curriculum learning strategy to progressively sample object images that
are harder to distinguish for training the model, enhancing its ability to capture discriminative
features for fine-grained tasks amidst state changes and other transformations. Experimental
evaluations demonstrate significant improvements in object recognition and retrieval
performance compared to previous methods, validating the effectiveness of the proposed
approaches across several challenging datasets under various transformations.</p>
|
Page generated in 0.1346 seconds