Return to search

Contribution to concept detection on images using visual and textual descriptors / Contribution à la détection de concepts sur des images utilisant des descripteurs visuels et textuels

Pas de résumé / This thesis is dedicated to the problem of training and integration strategies of several modalities (visual, textual), in order to perform an efficient Visual Concept Detection and Annotation (VCDA) task, which has become a very popular and important research topic in recent years because of its wide range of application such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of efforts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. This means that the image content can hardly be described by low-level visual features. In order to address these problems, the text associated with images is used to capture valuable semantic meanings about image content. Moreover, In order to benefit from both visual models and textual models, we propose multimodal approach. As the typical visual models, designing good visual descriptors and modeling these descriptors play an important role. Meanwhile how to organize the text associated with images is also very important. In this context, the objective of this thesis is to propose some innovative contributions for the task of VCDA. For visual models, a novel visual features/descriptors was proposed, which effectively and efficiently represent the visual content of images/videos. In addition, a novel method for encoding local binary descriptors was present. For textual models, we proposed two kinds of novel textual descriptor. The first descriptor is semantic Bag-of-Words(sBoW) using a dictionary. The second descriptor is Image Distance Feature(IDF) based on tags associated with images. Finally, in order to benefit from both visual models and textual models, fusion is carried out by MKL efficiently embed. [...]

Identiferoai:union.ndltd.org:theses.fr/2014ECDL0014
Date15 May 2014
CreatorsZhang, Yu
ContributorsEcully, Ecole centrale de Lyon, Chen, Liming, Bres, Stéphane
Source SetsDépôt national des thèses électroniques françaises
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation, Text

Page generated in 0.0022 seconds