Spelling suggestions: "subject:"graph summarization"" "subject:"graph ummarization""
1 |
EXPLORING MULTIPLEX NETWORKSPolychronopoulou, Athanasia 12 1900 (has links)
Complex network theory has been well established as one of the main tools for understanding and analyzing the behavior of the natural systems that surround us. Social networks, genetic and protein interaction networks, airline and road traffic networks, brain connectivity networks and web graphs are only some of the examples. As network theory evolves it becomes more apparent that these complex systems are often composed of multiple types of interactions, each carrying a different piece of information, and therefore are commonly represented in the form of multiplex networks, where each layer represents a different type of interaction among nodes.
In addition to the interactions among the nodes of the networks, these systems also present correlations among the various types of interactions, as represented by the intrinsic structure and the associations of the various layers of the graph. For example, in social sciences, a network with a large overlap between two layers that represent two distinct types of people interactions i.e. friendship and professional ties might indicate that there is an interconnection between the two in the given network. In another example, in transportation networks, where nodes represent airports connected by flights operated by specific airlines (each airline representing a layer of the graph), the structure of the layers can provide information about the airline: traditional airlines such as Lufthansa tend to have a large overlap in activity pattern with other airlines, whereas low-cost airlines such as easyJet tend to avoid such overlaps.
Due to their ability to represent such complex entity interactions, multiplex networks have lately been the focus of a large part of the research community, studying a variety of aspects, such as structural measures, node communities detection, layer reducibility, network generative models, and information spreading. In this work we focus on techniques for the exploration of the intrinsic structure of multiplex networks, and contemplate ways of addressing common challenges of learning from multiplex networks.
In particular, our work focuses on three main directions: structured regression, graph summarization and graph similarity. We analyze and discuss the main challenges of each of these research directions, and then we propose novel methods to address them. For each problem, we utilize artificial data to study their effectiveness, understand their intrinsic properties and evaluate their behavior under a controlled network structure. Then, we report applications on real-world data sets, from variety of domains, and compare our proposed methods with state-of-the-art and well established baseline methods. Through this work, we aim to offer proof that the networks' intrinsic structure, when utilized, can increase the informative power of network theory models and allow researchers to build more educated algorithms. / Computer and Information Science
|
2 |
Graph Summarization: Algorithms, Trained Heuristics, and Practical Storage ApplicationHodulik, George M. 02 June 2017 (has links)
No description available.
|
3 |
Domain-based Frameworks and Embeddings for Dynamics over NetworksAdhikari, Bijaya 01 June 2020 (has links)
Broadly this thesis looks into network and time-series mining problems pertaining to dynamics over networks in various domains. Which locations and staff should we monitor in order to detect C. Difficile outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence in an interpretable fashion? How do we infer the states of all nodes in a critical infrastructure network where failures have occurred? Leveraging domain-based information should make it is possible to answer these questions. However, several new challenges arise, such as (a) presence of more complex dynamics. The dynamics over networks that we consider are complex. For example, C. Difficile spreads via both people-to-people and surface-to-people interactions and correlations between failures in critical infrastructures go beyond the network structure and depend on the geography as well. Traditional approaches either rely on models like Susceptible Infectious (SI) and Independent Cascade (IC) which are too restrictive because they focus only on single pathways or do not incorporate the model at all, resulting in sub-optimality. (b) data sparsity. Additionally, the data sparsity still persists in this space. Specifically, it is difficult to collect the exact state of each node in the network as it is high-dimensional and difficult to directly sample from. (c) mismatch between data and process. In many situations, the underlying dynamical process is unknown or depends on a mixture of several models. In such cases, there is a mismatch between the data collected and the model representing the dynamics. For example, the weighted influenza like illness (wILI) count released by the CDC, which is meant to represent the raw fraction of total population infected by influenza, actually depends on multiple factors like the number of health-care providers reporting the number and public tendency to seek medical advice. In such cases, methods which generalize well to unobserved (or unknown) models are required. Current approaches often fail in tackling these challenges as they either rely on restrictive models, require large volume of data, and/or work only for predefined models.
In this thesis, we propose to leverage domain-based frameworks, which include novel models and analysis techniques, and domain-based low dimensional representation learning to tackle the challenges mentioned above for networks and time-series mining tasks. By developing novel frameworks, we can capture the complex dynamics accurately and analyze them more efficiently. For example, to detect C. Difficile outbreaks in a hospital setting, we use a two-mode disease model to capture multiple pathways of outbreaks and discrete lattice-based optimization framework. Similarly, we propose an information theoretic framework which includes geographically correlated failures in critical infrastructure networks to infer the status of the network components. Moreover, as we use more realistic frameworks to accurately capture and analyze the mechanistic processes themselves, our approaches are effective even with sparse data. At the same time, learning low-dimensional domain-aware embeddings capture domain specific properties (like incidence-based similarity between historical influenza seasons) more efficiently from sparse data, which is useful for subsequent tasks. Similarly, since the domain-aware embeddings capture the model information directly from the data without any modeling assumptions, they generalize better to new models.
Our domain-aware frameworks and embeddings enable many applications in critical domains. For example, our domain-aware frameworks for C. Difficile allows different monitoring rates for people and locations, thus detecting more than 95% of outbreaks. Similarly, our framework for product recommendation in e-commerce for queries with sparse engagement data resulted in a 34% improvement over the current Walmart.com search engine. Similarly, our novel framework leads to a near optimal algorithms, with additive approximation guarantee, for inferring network states given a partial observation of the failures in networks. Additionally, by exploiting domain-aware embeddings, we outperform non-trivial competitors by up to 40% for influenza forecasting. Similarly, domain-aware representations of subgraphs helped us outperform non-trivial baselines by up to 68% in the graph classification task. We believe our techniques will be useful for variety of other applications in many areas like social networks, urban computing, and so on. / Doctor of Philosophy / Which locations and staff should we monitor to detect pathogen outbreaks in hospitals? How do we predict the peak intensity of the influenza incidence? How do we infer the failures in water distribution networks? These are some of the questions on dynamics over networks discussed in this thesis. Here, we leverage the domain knowledge to answer these questions. Specifically, we propose (a) novel optimization frameworks where we exploit domain knowledge for tractable formulations and near-optimal algorithms, and (b) low dimensional representation learning where we design novel neural architectures inspired by domain knowledge. Our frameworks capture the complex dynamics accurately and help analyze them more efficiently. At the same time, our low-dimensional embeddings capture domain specific properties more efficiently from sparse data, which is useful for subsequent tasks. Similarly, our domain-aware embeddings are inferred directly from the data without any modeling assumptions, hence they generalize better. The frameworks and embeddings we develop enable many applications in several domains. For example, our domain-aware framework for outbreak detection in hospitals has more than 95% accuracy. Similarly, our framework for product recommendation in e-commerce for queries with sparse data resulted in a 34% improvement over state-of-the-art e-commerce search engine. Additionally, our approach outperforms non-trivial competitors by up to 40% in influenza forecasting.
|
4 |
Efficient Graph Summarization of Large NetworksHajiabadi, Mahdi 24 June 2022 (has links)
In this thesis, we study the notion of graph summarization,
which is a fundamental task of finding a compact representation of the original graph called the summary.
Graph summarization can be used for reducing the footprint of the input graph, better visualization, anonymizing the identity of users, and query answering.
There are two different frameworks of graph summarization we consider in this thesis, the utility-based framework and the correction set-based framework.
In the utility-based framework, the input graph is summarized until a utility threshold is not violated.
In the correction set-based framework a set of correction edges is produced along with the summary graph.
In this thesis we propose two algorithms for the utility-based framework and one for the correction set-based framework. All these three algorithms are for static graphs (i.e. graphs that do not change over time).
Then, we propose two more utility-based algorithms for fully dynamic graphs (i.e. graphs with edge insertions and deletions).
Algorithms for graph summarization can be lossless (summarizing the input graph without losing any information) or lossy (losing some information about the input graph in order to summarize it more).
Some of our algorithms are lossless and some lossy, but with controlled utility loss.
Our first utility-driven graph summarization algorithm, G-SCIS, is based on a clique and independent set decomposition, that produces optimal compression with zero
loss of utility. The compression provided is significantly better than
state-of-the-art in lossless graph summarization, while the runtime
is two orders of magnitude lower.
Our second algorithm is T-BUDS, a highly scalable, utility-driven algorithm for fully controlled lossy summarization.
It achieves high scalability by combining memory reduction using Maximum Spanning Tree with a novel binary
search procedure. T-BUDS outperforms state-of-the-art drastically in terms of the quality of summarization and is about two orders of magnitude better in terms of speed. In contrast to the competition, we are able to handle web-scale graphs in a single machine
without performance impediment as the utility threshold (and size of summary) decreases. Also, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank and shortest paths.
We then propose algorithm LDME, a correction set-based graph summarization algorithm that produces compact output representations in a fast and scalable manner. To achieve this, we introduce (1) weighted locality sensitive hashing to drastically reduce the number of comparisons required to find good node merges, (2) an efficient way to compute the best quality merges that produces more compact outputs, and (3) a new sort-based encoding algorithm that is faster and more robust. More interestingly, our algorithm provides performance tuning settings to allow the option of trading compression for running
time. On high compression settings, LDME achieves compression equal to or better than the state of the art with up to 53x speedup in running time. On high speed settings, LDME achieves up to two orders of magnitude speedup with only slightly lower compression.
We also present two lossless summarization algorithms, Optimal and Scalable, for summarizing fully dynamic graphs.
More concretely, we follow the framework of G-SCIS, which produces summaries that can be used as-is in several graph analytics tasks. Different from G-SCIS, which is a batch algorithm, Optimal and Scalable are fully dynamic and can respond rapidly to each change in the graph.
Not only are Optimal and Scalable able to outperform G-SCIS and other batch algorithms by several orders of magnitude, but they also significantly outperform MoSSo, the state-of-the-art in lossless dynamic graph summarization.
While Optimal produces always the most optimal summary, Scalable is able to trade the amount of node reduction for extra scalability.
For reasonable values of the parameter $K$, Scalable is able to outperform Optimal by an order of magnitude in speed, while keeping the rate of node reduction close to that of Optimal.
An interesting fact that we observed experimentally is that even if we were to run a batch algorithm, such as G-SCIS, once for every big batch of changes, still they would be much slower than Scalable. For instance, if 1 million changes occur in a graph, Scalable is two orders of magnitude faster than running G-SCIS just once at the end of the 1 million-edge sequence. / Graduate
|
5 |
SynopSys: Large Graph Analytics in the SAP HANA Database Through SummarizationRudolf, Michael, Paradies, Marcus, Bornhövd, Christof, Lehner, Wolfgang 19 September 2022 (has links)
Graph-structured data is ubiquitous and with the advent of social networking platforms has recently seen a significant increase in popularity amongst researchers. However, also many business applications deal with this kind of data and can therefore benefit greatly from graph processing functionality offered directly by the underlying database. This paper summarizes the current state of graph data processing capabilities in the SAP HANA database and describes our efforts to enable large graph analytics in the context of our research project SynopSys. With powerful graph pattern matching support at the core, we envision OLAP-like evaluation functionality exposed to the user in the form of easy-to-apply graph summarization templates. By combining them, the user is able to produce concise summaries of large graph-structured datasets. We also point out open questions and challenges that we plan to tackle in the future developments on our way towards large graph analytics.
|
Page generated in 0.1157 seconds