Global ETD Search

1	Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform Wu, Xiaolong 16 December 2015 (has links) Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our approach achieves an average of 1.6x and 2.9x speedup separately on the nine representative matrices from University of Florida sparse matrix collection. Sparse matrix-matrix multiplication Data locality Pipelining GPU
2	Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication Ahmed, Salman, Houser, Jennifer, Hoque, Mohammad A., Raju, Rezaul, Pfeiffer, Phil 01 July 2017 (has links) Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results. communication overhead MPI communication parallel computing performance analysis scalability sparse matrix-matrix multiplication Computing
3	Characterization and Enhancement of Data Locality and Load Balancing for Irregular Applications Niu, Qingpeng 14 May 2015 (has links) No description available. Computer Science Irregular applications Program Locality Load balancing Performance Optimization Reuse distance Parallel Algorithm SpGEMM

1

Page generated in 0.1628 seconds