This paper presents a novel implementation of parallel sparse matrix-matrix multiplication using distributed memory systems on heterogeneous hardware architecture. The proposed algorithm is expected to be linearly scalable up to several thousands of processors for matrices with dimensions over 106 (million). Our approach of parallelism is based on 1D decomposition and can work for both structured and unstructured sparse matrices. The storage mechanism is based on distributed hash lists, which reduces the latency for accessing and modifying an element of the product matrix, while reducing the overall merging time of the partial results computed by the processors. Theoretically, the time and space complexity of our algorithm is linearly proportional to the total number of non-zero elements in the product matrix C. The results of the performance evaluation show that the algorithm scales much better for sparse matrices with bigger dimensions. The speedup achieved using our algorithm is much better than other existing 1D algorithms. We have been able to achieve about 500 times speedup with only 672 processors. We also identified the impact of hardware architecture on scalability.
Identifer | oai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etsu-works-16771 |
Date | 01 January 2015 |
Creators | Hoque, Mohammad Asadul, Raju, Md Rezaul Karim, Tymczak, Christopher John, Vrinceanu, Daniel, Chilakamarri, Kiran |
Publisher | Digital Commons @ East Tennessee State University |
Source Sets | East Tennessee State University |
Detected Language | English |
Type | text |
Source | ETSU Faculty Works |
Page generated in 0.0014 seconds