Return to search

HPC-based Parallel Algorithms for Generating Random Networks and Some Other Network Analysis Problems

The advancement of modern technologies has resulted in an explosive growth of complex systems, such as the Internet, biological, social, and various infrastructure networks, which have, in turn, contributed to the rise of massive networks. During the past decade, analyzing and mining of these networks has become an emerging research area with many real-world applications. The most relevant problems in this area include: collecting and managing networks, modeling and generating random networks, and developing network mining algorithms. In the era of big data, speed is not an option anymore for the effective analysis of these massive systems, it is an absolute necessity. This motivates the need for parallel algorithms on modern high-performance computing (HPC) systems including multi-core, distributed, and graphics processor units (GPU) based systems. In this dissertation, we present distributed memory parallel algorithms for generating massive random networks and a novel GPU-based algorithm for index searching.

This dissertation is divided into two parts. In Part I, we present parallel algorithms for generating massive random networks using several widely-used models. We design and develop a novel parallel algorithm for generating random networks using the preferential-attachment model. This algorithm can generate networks with billions of edges in just a few minutes using a medium-sized computing cluster. We develop another parallel algorithm for generating random networks with a given sequence of expected degrees. We also design a new a time and space efficient algorithmic method to generate random networks with any degree distributions. This method has been applied to generate random networks using other popular network models, such as block two-level Erdos-Renyi and stochastic block models. Parallel algorithms for network generation pose many nontrivial challenges such as dependency on edges, avoiding duplicate edges, and load balancing. We applied novel techniques to deal with these challenges. All of our algorithms scale very well to a large number of processors and provide almost linear speed-up.

Dealing with a large number of networks collected from a variety of fields requires efficient management systems such as graph databases. Finding a record in those databases is very critical and typically is the main bottleneck for performance. In Part II of the dissertation, we develop a GPU-based parallel algorithm for index searching. Our algorithm achieves the fastest throughput ever reported in the literature for various benchmarks. / Ph. D. / The advancement of modern technologies has resulted in an explosive growth of complex systems, such as the Internet, biological, social, and various infrastructure networks, which have, in turn, contributed to the rise of massive networks. During the past decade, analyzing and mining of these networks has become an emerging research area with many real-world applications. The most relevant problems in this area include: collecting and managing networks, modeling and generating random networks, and developing network mining algorithms. As the networks are massive in size, we need faster algorithms for the quick and effective analysis of these systems. This motivates the need for parallel algorithms on modern high-performance computing (HPC) based systems. In this dissertation, we present HPC-based parallel algorithms for generating massive random networks and managing large scale network data.

This dissertation is divided into two parts. In Part I, we present parallel algorithms for generating massive random networks using several widely-used models, such as the preferential attachment model, the Chung-Lu model, the block two-level Erdős-Rényi model and the stochastic block model. Our algorithms can generate networks with billions of edges in just a few minutes using a medium-sized HPC-based cluster. We applied novel load balancing techniques to distribute workloads equally among the processors. As a result, all of our algorithms scale very well to a large number of processors and provide almost linear speed-up. In Part II of the dissertation, we develop a parallel algorithm for finding records by given keys. Dealing with a large number of network data collected from a variety of fields requires efficient database management systems such as graph databases. Finding a record in those databases is very critical and typically is the main bottleneck for performance. Our algorithm achieves the fastest data lookup throughput ever reported in the literature for various benchmarks.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/73582
Date06 December 2016
CreatorsAlam, Md Maksudul
ContributorsComputer Science, Marathe, Madhav Vishnu, Khan, Md-Abdul Maleq, Vullikanti, Anil Kumar S., Heath, Lenwood S., Perumalla, Kalyan
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0022 seconds