Global ETD Search

Return to search

Accelerated Iterative Algorithms with Asynchronous Accumulative Updates on a Heterogeneous Cluster

In recent years with the exponential growth in web-based applications the amount of data generated has increased tremendously. Quick and accurate analysis of this 'big data' is indispensable to make better business decisions and reduce operational cost. The challenges faced by modern day data centers to process big data are multi fold: to keep up the pace of processing with increased data volume and increased data velocity, deal with system scalability and reduce energy costs. Today's data centers employ a variety of distributed computing frameworks running on a cluster of commodity hardware which include general purpose processors to process big data. Though better performance in terms of big data processing speed has been achieved with existing distributed computing frameworks, there is still an opportunity to increase processing speed further. FPGAs, which are designed for computationally intensive tasks, are promising processing elements that can increase processing speed. In this thesis, we discuss how FPGAs can be integrated into a cluster of general purpose processors running iterative algorithms and obtain high performance.
In this thesis, we designed a heterogeneous cluster comprised of FPGAs and CPUs and ran various benchmarks such as PageRank, Katz and Connected Components to measure the performance of the cluster. Performance improvement in terms of execution time was evaluated against a homogeneous cluster of general purpose processors and a homogeneous cluster of FPGAs. We built multiple four-node heterogeneous clusters with different configurations by varying the number of CPUs and FPGAs.
We studied the effects of load balancing between CPUs and FPGAs. We obtained a speedup of 20X, 11.5X and 2X for PageRank, Katz and Connected Components benchmarks on a cluster cluster configuration of 2 CPU + 2 FPGA for an unbalancing ratio against a 4-node homogeneous CPU cluster. We studied the effect of input graph partitioning, and showed that when the input is a Multilevel-KL partitioned graph we obtain an improvement of 11%, 26% and 9% over randomly partitioned graph for Katz, PageRank and Connected Components benchmarks on a 2 CPU + 2 FPGA cluster.

Asynchronous Accumulative Updates

Heterogeneous Computing

Computer and Systems Architecture

Digital Communications and Networking

Identifer	oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:masters_theses_2-1338
Date	23 March 2016
Creators	Gubbi Virupaksha, Sandesh
Publisher	ScholarWorks@UMass Amherst
Source Sets	University of Massachusetts, Amherst
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses

Page generated in 0.0016 seconds

Accelerated Iterative Algorithms with Asynchronous Accumulative Updates on a Heterogeneous Cluster

Description

Links & Downloads

Tags

Additional Fields