Return to search

Parallel Sparse Matrix-Matrix Multiplication: A Scalable Solution With 1D Algorithm

This paper presents a novel implementation of parallel sparse matrix-matrix multiplication using distributed memory systems on heterogeneous hardware architecture. The proposed algorithm is expected to be linearly scalable up to several thousands of processors for matrices with dimensions over 106 (million). Our approach of parallelism is based on 1D decomposition and can work for both structured and unstructured sparse matrices. The storage mechanism is based on distributed hash lists, which reduces the latency for accessing and modifying an element of the product matrix, while reducing the overall merging time of the partial results computed by the processors. Theoretically, the time and space complexity of our algorithm is linearly proportional to the total number of non-zero elements in the product matrix C. The results of the performance evaluation show that the algorithm scales much better for sparse matrices with bigger dimensions. The speedup achieved using our algorithm is much better than other existing 1D algorithms. We have been able to achieve about 500 times speedup with only 672 processors. We also identified the impact of hardware architecture on scalability.

Identiferoai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etsu-works-16771
Date01 January 2015
CreatorsHoque, Mohammad Asadul, Raju, Md Rezaul Karim, Tymczak, Christopher John, Vrinceanu, Daniel, Chilakamarri, Kiran
PublisherDigital Commons @ East Tennessee State University
Source SetsEast Tennessee State University
Detected LanguageEnglish
Typetext
SourceETSU Faculty Works

Page generated in 0.0058 seconds