Hypergraph partitioning lies at the heart of a number of problems in machine learning as well as other engineering disciplines. While partitioning uniform hypergraphs is often required in computer vision problems that involve multi-way similarities, non-uniform hypergraph partitioning has applications in database systems, circuit design etc. As in the case of graphs, it is known that for given objective and balance constraints, the problem of optimally partitioning a hypergraph is NP-hard. Yet, over the last two decades, several efficient heuristics have been studied in the literature and their empirical success is widely appreciated. In contrast to the extensive studies related to graph partitioning, the theoretical guarantees of hypergraph partitioning approaches have not received much attention in the literature. The purpose of this thesis is to establish the statistical error bounds for certain spectral algorithms for partitioning uniform as well as non-uniform hypergraphs.
The mathematical framework considered in this thesis is the following. Let V be a set of n vertices, and ψ : V ->{1,…,k} be a (hidden) partition of V into k classes. A random hypergraph (V,E) is generated according to a planted partition model, i.e., subsets of V are independently added to the edge set E with probabilities depending on the class memberships of the participating vertices. Let ψ' be the partition of V obtained from a certain algorithm acting on a random realization of the hypergraph. We provide an upper bound on the number of disagreements between ψ and ψ'. To be precise, we show that under certain conditions, the asymptotic error is o(n) with probability (1-o(1)). In the existing literature, such error rates are only known in the case of graphs (Rohe et al., Ann. Statist., 2011; Lei \& Rinaldo, Ann. Statist., 2015), where the planted model coincides with the popular stochastic block model. Our results are based on matrix concentration inequalities and perturbation bounds, and the derived bounds can be used to comment on the consistency of spectral hypergraph partitioning algorithms.
It is quite common in the literature to resort to a spectral approach when the quantity of interest is a matrix, for instance, the adjacency or Laplacian matrix for graph partitioning. This is certainly not true for hypergraph partitioning as the adjacency relations cannot be encoded into a symmetric matrix as in the case of graphs. However, if one restricts the problem to m-uniform hypergraphs for some m ≥ 2, then a symmetric tensor of order m can be used to express the multi-way interactions or adjacencies. Thus, the use of tensor spectral algorithms, based on the spectral theory of symmetric tensors, is a natural choice in this scenario. We observe that a wide variety of uniform hypergraph partitioning methods studied in the literature can be related to any one of two principle approaches: (1) solving a tensor trace maximization problem, or (2) use of the higher order singular value decomposition of tensors. We derive statistical error bounds to show that both these approaches lead to consistent partitioning algorithms.
Our results also hold when the hypergraph under consideration allows weighted edges, a situation that is commonly encountered in computer vision applications such as motion segmentation, image registration etc. In spite of the theoretical guarantees, a tensor spectral approach is not preferable in this setting due to the time and space complexity of computing the weighted adjacency tensor. Keeping this practical scenario in mind, we prove that consistency can still be achieved by incorporating certain tensor sampling strategies. In particular, we show that if the edges are sampled according to certain distribution, then consistent partitioning can be achieved with only few sampled edges. Experiments on benchmark problems demonstrate that such sampled tensor spectral algorithms are indeed useful in practice.
While vision tasks mostly involve uniform hypergraphs, in database and electronics applications, one often finds non-uniform hypergraphs with edges of varying sizes. These hypergraphs cannot be expressed in terms of adjacency matrices or tensors, and hence, use of a spectral approach is tricky in this context. The partitioning problem gets more challenging due to the fact that, in practice, these hypergraphs are quite sparse, and hence, provide less information about the partition. We consider spectral algorithms for partitioning clique and star expansions of hypergraphs, and study their consistency under a sparse planted partition model.
The results of hypergraph partitioning can be further extended to address the well-known hypergraph vertex coloring problem, where the objective is to color the vertices such that no edge is monochromatic. The hardness of this problem is well established. In fact, even when a hypergraph is bipartite or 2-colorable, it is NP-hard to find a proper 2-coloring for it. We propose a spectral coloring algorithm, and show that if the non-monochromatic subsets of vertices are independently added to the edge set with certain probabilities, then with probability (1-o(1)), our algorithm succeeds in coloring bipartite hypergraphs with only two colors.
To the best our knowledge, these are the first known results related to consistency of partitioning general hypergraphs.
Identifer | oai:union.ndltd.org:IISc/oai:etd.ncsi.iisc.ernet.in:2005/2947 |
Date | January 2016 |
Creators | Ghoshdastidar, Debarghya |
Contributors | Dukkipati, Ambedkar |
Source Sets | India Institute of Science |
Language | en_US |
Detected Language | English |
Type | Thesis |
Relation | G27845 |
Page generated in 0.003 seconds