• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform

Wu, Xiaolong 16 December 2015 (has links)
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our approach achieves an average of 1.6x and 2.9x speedup separately on the nine representative matrices from University of Florida sparse matrix collection.
2

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Ahmed, Salman, Houser, Jennifer, Hoque, Mohammad A., Raju, Rezaul, Pfeiffer, Phil 01 July 2017 (has links)
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results.
3

Characterization and Enhancement of Data Locality and Load Balancing for Irregular Applications

Niu, Qingpeng 14 May 2015 (has links)
No description available.

Page generated in 0.1628 seconds