Global ETD Search

1	Convex relaxation for the planted clique, biclique, and clustering problems Ames, Brendan January 2011 (has links) A clique of a graph G is a set of pairwise adjacent nodes of G. Similarly, a biclique (U, V ) of a bipartite graph G is a pair of disjoint, independent vertex sets such that each node in U is adjacent to every node in V in G. We consider the problems of identifying the maximum clique of a graph, known as the maximum clique problem, and identifying the biclique (U, V ) of a bipartite graph that maximizes the product \|U \| · \|V \|, known as the maximum edge biclique problem. We show that ﬁnding a clique or biclique of a given size in a graph is equivalent to ﬁnding a rank one matrix satisfying a particular set of linear constraints. These problems can be formulated as rank minimization problems and relaxed to convex programming by replacing rank with its convex envelope, the nuclear norm. Both problems are NP-hard yet we show that our relaxation is exact in the case that the input graph contains a large clique or biclique plus additional nodes and edges. For each problem, we provide two analyses of when our relaxation is exact. In the ﬁrst, the diversionary edges are added deterministically by an adversary. In the second, each potential edge is added to the graph independently at random with ﬁxed probability p. In the random case, our bounds match the earlier bounds of Alon, Krivelevich, and Sudakov, as well as Feige and Krauthgamer for the maximum clique problem. We extend these results and techniques to the k-disjoint-clique problem. The maximum node k-disjoint-clique problem is to ﬁnd a set of k disjoint cliques of a given input graph containing the maximum number of nodes. Given input graph G and nonnegative edge weights w, the maximum mean weight k-disjoint-clique problem seeks to identify the set of k disjoint cliques of G that maximizes the sum of the average weights of the edges, with respect to w, of the complete subgraphs of G induced by the cliques. These problems may be considered as a way to pose the clustering problem. In clustering, one wants to partition a given data set so that the data items in each partition or cluster are similar and the items in diﬀerent clusters are dissimilar. For the graph G such that the set of nodes represents a given data set and any two nodes are adjacent if and only if the corresponding items are similar, clustering the data into k disjoint clusters is equivalent to partitioning G into k-disjoint cliques. Similarly, given a complete graph with nodes corresponding to a given data set and edge weights indicating similarity between each pair of items, the data may be clustered by solving the maximum mean weight k-disjoint-clique problem. We show that both instances of the k-disjoint-clique problem can be formulated as rank constrained optimization problems and relaxed to semideﬁnite programs using the nuclear norm relaxation of rank. We also show that when the input instance corresponds to a collection of k disjoint planted cliques plus additional edges and nodes, this semideﬁnite relaxation is exact for both problems. We provide theoretical bounds that guarantee exactness of our relaxation and provide empirical examples of successful applications of our algorithm to synthetic data sets, as well as data sets from clustering applications. nuclear norm minimization convex optimization maximum clique clustering Combinatorics and Optimization
2	Convex relaxation for the planted clique, biclique, and clustering problems Ames, Brendan January 2011 (has links) A clique of a graph G is a set of pairwise adjacent nodes of G. Similarly, a biclique (U, V ) of a bipartite graph G is a pair of disjoint, independent vertex sets such that each node in U is adjacent to every node in V in G. We consider the problems of identifying the maximum clique of a graph, known as the maximum clique problem, and identifying the biclique (U, V ) of a bipartite graph that maximizes the product \|U \| · \|V \|, known as the maximum edge biclique problem. We show that ﬁnding a clique or biclique of a given size in a graph is equivalent to ﬁnding a rank one matrix satisfying a particular set of linear constraints. These problems can be formulated as rank minimization problems and relaxed to convex programming by replacing rank with its convex envelope, the nuclear norm. Both problems are NP-hard yet we show that our relaxation is exact in the case that the input graph contains a large clique or biclique plus additional nodes and edges. For each problem, we provide two analyses of when our relaxation is exact. In the ﬁrst, the diversionary edges are added deterministically by an adversary. In the second, each potential edge is added to the graph independently at random with ﬁxed probability p. In the random case, our bounds match the earlier bounds of Alon, Krivelevich, and Sudakov, as well as Feige and Krauthgamer for the maximum clique problem. We extend these results and techniques to the k-disjoint-clique problem. The maximum node k-disjoint-clique problem is to ﬁnd a set of k disjoint cliques of a given input graph containing the maximum number of nodes. Given input graph G and nonnegative edge weights w, the maximum mean weight k-disjoint-clique problem seeks to identify the set of k disjoint cliques of G that maximizes the sum of the average weights of the edges, with respect to w, of the complete subgraphs of G induced by the cliques. These problems may be considered as a way to pose the clustering problem. In clustering, one wants to partition a given data set so that the data items in each partition or cluster are similar and the items in diﬀerent clusters are dissimilar. For the graph G such that the set of nodes represents a given data set and any two nodes are adjacent if and only if the corresponding items are similar, clustering the data into k disjoint clusters is equivalent to partitioning G into k-disjoint cliques. Similarly, given a complete graph with nodes corresponding to a given data set and edge weights indicating similarity between each pair of items, the data may be clustered by solving the maximum mean weight k-disjoint-clique problem. We show that both instances of the k-disjoint-clique problem can be formulated as rank constrained optimization problems and relaxed to semideﬁnite programs using the nuclear norm relaxation of rank. We also show that when the input instance corresponds to a collection of k disjoint planted cliques plus additional edges and nodes, this semideﬁnite relaxation is exact for both problems. We provide theoretical bounds that guarantee exactness of our relaxation and provide empirical examples of successful applications of our algorithm to synthetic data sets, as well as data sets from clustering applications. nuclear norm minimization convex optimization maximum clique clustering Combinatorics and Optimization
3	A probabilistic framework and algorithms for modeling and analyzing multi-instance data Behmardi, Behrouz 28 November 2012 (has links) Multi-instance data, in which each object (e.g., a document) is a collection of instances (e.g., word), are widespread in machine learning, signal processing, computer vision, bioinformatic, music, and social sciences. Existing probabilistic models, e.g., latent Dirichlet allocation (LDA), probabilistic latent semantic indexing (pLSI), and discrete component analysis (DCA), have been developed for modeling and analyzing multiinstance data. Such models introduce a generative process for multi-instance data which includes a low dimensional latent structure. While such models offer a great freedom in capturing the natural structure in the data, their inference may present challenges. For example, the sensitivity in choosing the hyper-parameters in such models, requires careful inference (e.g., through cross-validation) which results in large computational complexity. The inference for fully Bayesian models which contain no hyper-parameters often involves slowly converging sampling methods. In this work, we develop approaches for addressing such challenges and further enhancing the utility of such models. This dissertation demonstrates a unified convex framework for probabilistic modeling of multi-instance data. The three main aspects of the proposed framework are as follows. First, joint regularization is incorporated into multiple density estimation to simultaneously learn the structure of the distribution space and infer each distribution. Second, a novel confidence constraints framework is used to facilitate a tuning-free approach to control the amount of regularization required for the joint multiple density estimation with theoretical guarantees on correct structure recovery. Third, we formulate the problem using a convex framework and propose efficient optimization algorithms to solve it. This work addresses the unique challenges associated with both discrete and continuous domains. In the discrete domain we propose a confidence-constrained rank minimization (CRM) to recover the exact number of topics in topic models with theoretical guarantees on recovery probability and mean squared error of the estimation. We provide a computationally efficient optimization algorithm for the problem to further the applicability of the proposed framework to large real world datasets. In the continuous domain, we propose to use the maximum entropy (MaxEnt) framework for multi-instance datasets. In this approach, bags of instances are represented as distributions using the principle of MaxEnt. We learn basis functions which span the space of distributions for jointly regularized density estimation. The basis functions are analogous to topics in a topic model. We validate the efficiency of the proposed framework in the discrete and continuous domains by extensive set of experiments on synthetic datasets as well as on real world image and text datasets and compare the results with state-of-the-art algorithms. / Graduation date: 2013 Multi-instance learning Maximum entropy Low-rank matrix recovery Confidence-constraint Nuclear norm minimization Entropy (Information theory) Machine learning -- Mathematical models Computer algorithms

1

Page generated in 0.0996 seconds