Return to search

Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach

Dense linear algebra(DLA) is one of the most seven important kernels inhigh performance computing. The introduction of new machines from vendorsprovides us opportunities to optimize DLA libraries for the new machinesand thus exploit their power. Unfortunately the optimization phase is notstraightforward. The optimum code of a certain Basic Linear AlgebraSubprogram (BLAS) kernel, which is the core of DLA algorithms, in twodifferent machines with different semiconductor process can be differenteven if they share the same features in terms of instruction setarchitecture, memory hierarchy and clock speed. It has become a traditionto optimize BLAS for new machines. Vendors maintain highly optimized BLASlibraries targeting their CPUs. Unfortunately the existing BLAS for GPUsis not highly optimized for DLA algorithms. In my research, I haveprovided new algorithms for several important BLAS kernels for differentgeneration of GPUs and introduced a pointer redirecting approach to makeBLAS run faster in generic problem size. I have also presented anauto-tuning approach to parameterize the developed BLAS algorithms andselect the best set of parameters for a given card.The hardware trends have also brought up the need for updates on existinglegacy DLA software packages, such as the sequential LAPACK. To takeadvantage of the new computational environment, successors of LAPACK mustincorporate algorithms of three main characteristics: high parallelism,reduced communication, and heterogeneity-awareness. On multicorearchitectures, Parallel Linear Algebra Software for MulticoreArchitectures (PLASMA) has been developed to meet the challenges inmulticore. On the other extreme, Matrix Algebra on GPU and MulticoreArchitectures (MAGMA) library demonstrated a hybridization approach thatindeed streamlined the development of high performance DLA for multicoreswith GPU accelerators. The performance of these two libraries depend uponright choice of parameters for a given problem size and given number ofcores and/or GPUs. In this work, the issue of automatically tuning thesetwo libraries is presented. A prune based empirical auto-tuning method hasbeen proposed for tuning PLASMA. Part of the tuning method for PLASMA wasconsidered to tune hybrid MAGMA library.

Identiferoai:union.ndltd.org:UTENN_/oai:trace.tennessee.edu:utk_gradthes-1794
Date01 August 2010
CreatorsNath, Rajib Kumar
PublisherTrace: Tennessee Research and Creative Exchange
Source SetsUniversity of Tennessee Libraries
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMasters Theses

Page generated in 0.0022 seconds