Return to search

Finding Interesting Subgraphs with Guarantees

Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others. There are many fundamental problems when analyzing network data, such as anomaly detection, dense subgraph mining, motif finding, information diffusion, and epidemic spread. A common underlying task in all these problems is finding an "interesting subgraph"; that is, finding a part of the graph---usually small relative to the whole---that optimizes a score function and has some property of interest, such as connectivity or a minimum density.

Finding subgraphs that satisfy common constraints of interest, such as the ones above, is computationally hard in general, and state-of-the-art algorithms for many problems in network analysis are heuristic in nature. These methods are fast and usually easy to implement. However, they come with no theoretical guarantees on the quality of the solution, which makes it difficult to assess how the discovered subgraphs compare to an optimal solution, which in turn affects the data mining task at hand. For instance, in anomaly detection, solutions with low anomaly score lead to sub-optimal detection power. On the other end of the spectrum, there have been significant advances on approximation algorithms for these challenging graph problems in the theoretical computer science community. However, these algorithms tend to be slow, difficult to implement, and they do not scale to the large datasets that are common nowadays.

The goal of this dissertation is developing scalable algorithms with theoretical guarantees for various network analysis problems, where the underlying task is to find subgraphs with constraints. We find interesting subgraphs with guarantees by adapting techniques from parameterized complexity, convex optimization, and submodularity optimization. These techniques are well-known in the algorithm design literature, but they lead to slow and impractical algorithms. One unifying theme in the problems that we study is that our methods are scalable without sacrificing the theoretical guarantees of these algorithm design techniques. We accomplish this combination of scalability and rigorous bounds by exploiting properties of the problems we are trying to optimize, decomposing or compressing the input graph to a manageable size, and parallelization.

We consider problems on network analysis for both static and dynamic network models. And we illustrate the power of our methods in applications, such as public health, sensor data analysis, and event detection using social media data. / Ph. D. / Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others. There are many fundamental problems when analyzing network data, such as anomaly detection, dense subgraph mining, motif finding, information diffusion, and epidemic spread. A common underlying task in all these problems is finding an “interesting subgraph”; that is, finding a part of the graph—usually small relative to the whole—that optimizes a score function and has some property of interest, such as being connected.

Finding subgraphs that satisfy common constraints of interest is computationally hard, and existing techniques for many problems of this kind are heuristic in nature. Heuristics are fast and usually easy to implement. However, they come with no theoretical guarantees on the quality of the solution, which makes it difficult to assess how the discovered subgraphs compare to an optimal solution, which in turn affects the data mining task at hand. For instance, in anomaly detection, solutions with low anomaly score lead to sub-optimal detection power. On the other end of the spectrum, there have been significant progress on these challenging graph problems in the theoretical computer science community. However, these techniques tend to be slow, difficult to implement, and they do not scale to the large datasets that are common nowadays.

The goal of this dissertation is developing scalable algorithms with theoretical guarantees for various network analysis problems, where the underlying task is to find subgraphs with constraints. One unifying theme in the problems that we study is that our methods are scalable without sacrificing theoretical guarantees. We accomplish this combination of scalability and rigorous bounds by exploiting properties of the problems we are trying to optimize, decomposing or compressing the input graph to a manageable size, and parallelization.

We consider problems on network analysis for both static and dynamic network models. And we illustrate the power of our methods in applications, such as public health, sensor data analysis, and event detection using social media data.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/81960
Date29 January 2018
CreatorsCadena, Jose
ContributorsComputer Science, Vullikanti, Anil Kumar S., Marathe, Madhav Vishnu, Lu, Chang-Tien, Konjevod, Goran, Ramakrishnan, Naren
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0019 seconds