We consider the problem of localizing objects in weakly labeled images/videos. An image/video (e.g., Flickr image and YouTube video) is weakly labeled if it is associated with a tag describing the main object present in the image/video. It is weakly labeled because the tag only indicates the presence/absence of the object, but does not provide the detailed spatial location of the object. Given an image/video with an object tag, our goal is to localize the object in it. In this thesis, we propose two novel techniques to handle this challenging problem. First, we build a video-specific object appearance model and then incorporate temporal consistency information to localize the object. Second, we make use of existing detectors of some other object classes (which we call "familiar objects") to build the appearance model of the unseen object class (i.e., the object of interest). Experimental results show the effectiveness of the proposed methods. / October 2016
Identifer | oai:union.ndltd.org:MANITOBA/oai:mspace.lib.umanitoba.ca:1993/31811 |
Date | 06 1900 |
Creators | Rochan, Mrigank |
Contributors | Wang, Yang (Computer Science), Bruce, Neil (Computer Science) Leung, Carson (Computer Science) Xu, Wayne (Biochemistry and Medical Genetics) |
Publisher | IEEE, Springer International Publishing, IEEE |
Source Sets | University of Manitoba Canada |
Detected Language | English |
Page generated in 0.0017 seconds