Deep neural networks can achieve state-of-the-art performance on various image recognition tasks, such as object categorization (image classification) and object localization (object detection), with the help of a large amount of training data. However, to achieve models that perform well in the real world, we must overcome the shift from training to real-world data, which involves two factors: (1) covariate shift and (2) unseen classes.
Covariate shift occurs when the input distribution of a particular category changes from the training time. Deep models can easily make mistakes with a small change in the input, such as small noise addition, lighting change, or changes in the object pose. On the other hand, unseen classes - classes that are absent in the training set - may be present in real-world test samples. It is important to differentiate between "seen" and "unseen" classes in image classification, while locating diverse classes, including classes unseen during training, is crucial in object detection. Therefore, an open-world image recognition model needs to handle both factors. In this thesis, we propose approaches for image classification and object detection that can handle these two kinds of shifts in a label-efficient way.
Firstly, we examine the adaptation of large-scale pre-trained models to the object detection task while preserving their robustness to handle covariate shift. We investigate various pre-trained models and discover that the acquisition of robust representations by a trained model depends heavily on the pre-trained model’s architecture. Based on this intuition, we develop simple techniques to prevent the loss of generalizable representations.
Secondly, we study the adaptation to an unlabeled target domain for object detection to address the covariate shift. Traditional domain alignment methods may be inadequate due to various factors that cause domain shift between the source and target domains, such as layout and the number of objects in an image. To address this, we propose a strong-weak distribution alignment approach that can handle diverse domain shifts. Furthermore, we study the problem of semi-supervised domain adaptation for image classification when partially labeled target data is available. We introduce a simple yet effective approach, MME, for this task, which extracts discriminative features for the target domain using adversarial learning. We also develop a method to handle the situation where the unlabeled target domain includes categories unseen in the source domain. Since there is no supervision, recognizing instances of unseen classes as "unseen" is challenging. To address this, we devise a straightforward approach that trains a one-vs-all classifier using source data to build a classifier that can detect unseen instances. Additionally, we introduce an approach to enable an object detector to recognize an unseen foreground instance as an "object" using a simple data augmentation and learning framework that is applicable to diverse detectors and datasets.
In conclusion, our proposed approaches employ various datasets or architectures due to their simple design and achieve state-of-the-art results. Our work can contribute to the development of a unified open-world image recognition model in future research.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/49271 |
Date | 17 September 2024 |
Creators | Saito, Kuniaki |
Contributors | Saenko, Kate |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.0054 seconds