Return to search

Scalable Streaming Graph Partitioning

Large-scale graph-structured datasets are growing at an increasing rate. Social network graphs are an example of these datasets. Processing large-scale graphstructured datasets are central to many applications ranging from telecommunication to biology and has led to the development of many parallel graph algorithms. Performance of parallel graph algorithms largely depends on how the underlying graph is partitioned. In this work, we focus on studying streaming vertex-cut graph partitioning algorithms where partitioners receive a graph as a stream of vertices and edges and assign partitions to them on their arrival once and for all. Some of these algorithms maintain a state during partitioning. In some cases, the size of the state is so huge that it cannot be kept in a single machine memory. In many real world scenarios, several instances of a streaming graph partitioning algorithm are run simultaneously to improve the system throughput. However, running several instances of a partitioner drops the partitioning quality considerably due to the incomplete information of partitioners. Even frequently sharing states and its combination with buffering mechanisms does not completely solves the problem because of the heavy communication overhead produced by partitioners. In this thesis, we propose an algorithm which tackles the problem of low scalability and performance of existing streaming graph partitioning algorithms by providing an efficient way of sharing states and its combination with windowing mechanism. We compare state-of-the-art streaming graph partitioning algorithms with our proposed solution concerning performance and efficiency. Our solution combines a batch processing method with a shared-state mechanism to achieve both an outstanding performance and a high partitioning quality. Shared state mechanism is used for sharing states of partitioners. We provide a robust implementation of our method in a PowerGraph framework. Furthermore, we empirically evaluate the impact of partitioning quality on how graph algorithms perform in a real cloud environment. The results show that our proposed method outperforms other algorithms in terms of partitioning quality and resource consumption and improves partitioning time considerably. On average our method improves partitioning time by 23%, decreases communication load by 15% and increase memory consumption by only 5% compared to the state-of-the-art streaming graph partitioning.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-206113
Date January 2017
CreatorsSeyed Khamoushi, Seyed Mohammadreza
PublisherKTH, Skolan för informations- och kommunikationsteknik (ICT)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTRITA-ICT-EX ; 2017:28

Page generated in 0.0024 seconds