Global ETD Search

Return to search

Overlapping Communities on Large-Scale Networks: Benchmark Generation and Learning via Adaptive Stochastic Optimization

This dissertation builds on two lines of research that are related to the task of community detection on large-scale network data.

Our first contribution is a novel generator for large-scale networks with overlapping communities. Synthetic generators are essential for algorithm testing and simulation studies for networks, as these data are scarce and constantly evolving. We propose a generator based on a flexible random graph model that allows for the control of two complementary measures of centrality -- the degree centrality and the eigencentrality. For an arbitrary centrality target and community structure, we study the problem of recovering the model parameters that enforce such targets in expectation. We find that this problem always admits a solution in the parameter space, which is also unique for large graphs. We propose to recover this solution via a properly initialized multivariate-Newton Raphson algorithm. The resulting benchmark generator is able to simulate networks with a billion edges and hundreds of millions of nodes in 30 seconds, while reproducing a wide spectrum of network topologies -- including assortative mixing and power-law centrality distributions.

Our second contribution involves variance reduction techniques for stochastic variational inference (SVI). SVI scales approximate inference to large-scale data -- including massive networks -- via stochastic optimization. SVI is efficient because, at each iteration, it only uses a random minibatch of the data to produce a noisy estimate of the gradient. However, such estimates can suffer from high variance, which slows down convergence. One strategy to reduce the variance of the gradient is to use importance sampling, biasing the distribution of data for each minibatch towards the data points that are most influential to the inference at hand. Here, we develop an importance sampling strategy for SVI. Our adaptive stochastic variational inference algorithm (AdaSVI) reweights the sampling distribution to minimize the variance of the stochastic natural gradient. We couple the importance sampling strategy with an adaptive learning rate providing a parameter-free stochastic optimization algorithm where the only user input required is the minibatch size. We study AdaSVI on a matrix factorization model and find that it significantly improves SVI, leading to faster convergence on synthetic data.

https://doi.org/10.7916/6t26-m277

Statistics

Mathematics

Stochastic processes

Mathematical optimization

Computer networks

Machine learning--Statistical methods

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/6t26-m277
Date	January 2022
Creators	Grande, Alessandro Antonio
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0021 seconds

Overlapping Communities on Large-Scale Networks: Benchmark Generation and Learning via Adaptive Stochastic Optimization

Description

Links & Downloads

Tags

Additional Fields