Global ETD Search

Return to search

Graph and Subspace Learning for Domain Adaptation

In many practical problems, given that the instances in the training and test may be drawn from different distributions, traditional supervised learning can not achieve good performance on the new domain. Domain adaptation algorithms are therefore designed to bridge the distribution gap between training (source) data and test (target) data. In this thesis, I propose two graph learning and two subspace learning methods for domain adaptation. Graph learning methods use a graph to model pairwise relations between instances and then minimize the domain discrepancy based on the graphs directly. The first effort we make is to propose a novel locality preserving projection method for domain adaptation task, which can find a linear mapping preserving the intrinsic structure for both source and target domains. We first construct two graphs encoding the neighborhood information for source and target domains separately. We then find linear projection coefficients which have the property of locality preserving for each graph. Instead of combing the two objective terms under compatibility assumption and requiring the user to decide the importance of each objective function, we propose a multi-objective formulation for this problem and solve it simultaneously using Pareto optimization. Pareto optimization allows multiple objectives to compete with each other in deciding the optimal trade-off. We use generalized eigen-decomposition to find the pareto frontier, which captures all possible good linear projection coefficients that are preferred by one or more objectives. The second effort is to directly improve the pair-wise similarities between instances in the same domain as well as in different domains. We propose a novel method to solve domain adaptation task in a transductive setting. The proposed method bridges the distribution gap between source domain and target domain through affinity learning. It exploits the existence of a subset of data points in target domain which distribute similarly to the data points in the source domain. These data points act as the bridge that facilitates the data similarities propagation across domains. We also propose to control the relative importance of intra- and inter- domain similarities to boost the similarity propagation. In our approach, we first construct the similarity matrix which encodes both the intra- and inter- domain similarities. We then learn the true similarities among data points in joint manifold using graph diffusion. We demonstrate that with improved similarities between source and target data, spectral embedding provides a better data representation, which boosts the prediction accuracy. Subspace learning methods aim to find a new coordinate system, in which the domain discrepancy is minimized. In this thesis, we refer to subspace-based method as those which model the domain shift between two subspaces directly. Our first effort is to propose a novel linear subspace learning approach for domain adaptation. Our key observation is that in many real world problems, such as image classification with blurred test images or cross domain text classification, domain shift can be modeled by a linear transformation between the source and target domain (intrinsically linear transformation between two subspaces underlying the source and target data). Motivated by this observation, our method explicitly aligns the data in two domains using a linear transformation while simultaneously finding a subspace which preserves the most data variance. With explicit data alignment, the subspace learning is formulated as minimizing of a PCA-like objective, which consists of two variables: the basis vectors of the common subspace and the linear transformation between two domains. We show that the optimization can be solved efficiently using an iterative algorithm based on alternating minimization, and prove its convergence to a local optimum. Our method can also integrate the label information of source data, which further improves the robustness of the subspace learning and yields better prediction. Existing subspace based domain adaptation methods assume that data lie in a single low dimensional subspace. This assumption is too strong in many real world applications especially considering the domain could be a mixture of latent domains with significant inner-domain variations that should not be neglected. In our second approach, the key idea is to assume the data lie in a union of multiple low dimensional subspaces, which relaxes the common assumption above. We propose a novel two step subspace based domain adaptation algorithm: in subspaces discovery step, we cluster the source and target data using subspace clustering algorithm and estimate the subspace for each cluster using principal component analysis; in domain adaptation step, we propose a novel multiple subspace alignment (Multi-SA) algorithm, in which we identify one common subspace that aligns well with both source and target subspaces, and therefore, best preserves the variance for both domains. To solve this alignment problem jointly for multiple subspaces, we formulate this problem as solving an optimization problem that minimizes the weighted sum of multiple alignment costs. A higher weight is assigned to a source subspace if its label distribution has smaller distance, measured by KL divergence, compared to the overall label distribution. By putting more weights on those subspaces, the learned common subspace is able to to preserve the distinctive information. / Computer and Information Science

Computer Science

Computer Engineering

Identifer	oai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/3564
Date	January 2015
Creators	Shu, Le
Contributors	Latecki, Longin, Ling, Haibin, Vucetic, Slobodan, Zhu, Ying
Publisher	Temple University. Libraries
Source Sets	Temple University
Language	English
Detected Language	English
Type	Thesis/Dissertation, Text
Format	96 pages
Rights	IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relation	http://dx.doi.org/10.34944/dspace/3546, Theses and Dissertations

Page generated in 0.002 seconds

Graph and Subspace Learning for Domain Adaptation

Description

Links & Downloads

Tags

Additional Fields