Return to search

Learning structural SVMs and its applications in computer vision

Many computer vision problems involve building automatic systems by extracting complex high-level information from visual data. Such problems can often be modeled using structural models, which relate raw input variables to structural high-level output variables. Structural support vector machine is a discriminative method for learning structural models. It allows a flexible feature construction with good robustness against overfitting, and thus provides state-of-the-art prediction accuracies for structural prediction tasks in computer vision.
This thesis first studies the application of structural SVMs in interactive image segmentation. A novel interactive image segmentation technique that automatically learns segmentation parameters tailored for each and every image is proposed. Unlike existing work, the proposed method does not require any offline parameter tuning or training stage, and is capable of determining image-specific parameters according to some simple user interactions with the target image. The segmentation problem is modeled as an inference of a conditional random field (CRF) over a segmentation mask and the target image.

This CRF is parametrized by the weights for different terms (e.g., color, texture and smoothing). These weight parameters are learned via a one-slack structural SVM, which is solved using a constraint approximation scheme and the cutting plane algorithm. Experimental results show that the proposed method, by learning image-specific parameters automatically, outperforms other state-of-the-art interactive
image segmentation techniques.

This thesis then uses structural SVMs to speed up large scale relatively-paired space analysis. A new multi-modality analysis technique based on relatively-paired observations from multiple modalities is proposed. Relative-pairing information is encoded using relative proximities of observations in a latent common space. By building a discriminative model and maximizing a distance margin, a projection function that maps observations into the latent common space is learned for each modality. However, training based on large scale relatively-paired observations could be extremely time consuming. To this end, the training is reformulated as learning a structural model, which can be optimized by the cutting plane algorithm where only a few training samples are involved in each iteration. Experimental results validate the effectiveness and efficiency of the proposed technique. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy

Identiferoai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/206663
Date January 2014
CreatorsKuang, Zhanghui, 旷章辉
ContributorsWong, KKY
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
LanguageEnglish
Detected LanguageEnglish
TypePG_Thesis
RightsCreative Commons: Attribution 3.0 Hong Kong License, The author retains all proprietary rights, (such as patent rights) and the right to use in future works.
RelationHKU Theses Online (HKUTO)

Page generated in 0.0019 seconds