Return to search

Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach

Dense linear algebra(DLA) is one of the most seven important kernels in
high performance computing. The introduction of new machines from vendors
provides us opportunities to optimize DLA libraries for the new machines
and thus exploit their power. Unfortunately the optimization phase is not
straightforward. The optimum code of a certain Basic Linear Algebra
Subprogram (BLAS) kernel, which is the core of DLA algorithms, in two
different machines with different semiconductor process can be different
even if they share the same features in terms of instruction set
architecture, memory hierarchy and clock speed. It has become a tradition
to optimize BLAS for new machines. Vendors maintain highly optimized BLAS
libraries targeting their CPUs. Unfortunately the existing BLAS for GPUs
is not highly optimized for DLA algorithms. In my research, I have
provided new algorithms for several important BLAS kernels for different
generation of GPUs and introduced a pointer redirecting approach to make
BLAS run faster in generic problem size. I have also presented an
auto-tuning approach to parameterize the developed BLAS algorithms and
select the best set of parameters for a given card.
The hardware trends have also brought up the need for updates on existing
legacy DLA software packages, such as the sequential LAPACK. To take
advantage of the new computational environment, successors of LAPACK must
incorporate algorithms of three main characteristics: high parallelism,
reduced communication, and heterogeneity-awareness. On multicore
architectures, Parallel Linear Algebra Software for Multicore
Architectures (PLASMA) has been developed to meet the challenges in
multicore. On the other extreme, Matrix Algebra on GPU and Multicore
Architectures (MAGMA) library demonstrated a hybridization approach that
indeed streamlined the development of high performance DLA for multicores
with GPU accelerators. The performance of these two libraries depend upon
right choice of parameters for a given problem size and given number of
cores and/or GPUs. In this work, the issue of automatically tuning these
two libraries is presented. A prune based empirical auto-tuning method has
been proposed for tuning PLASMA. Part of the tuning method for PLASMA was
considered to tune hybrid MAGMA library.

Identiferoai:union.ndltd.org:UTENN/oai:trace.tennessee.edu:utk_gradthes-1794
Date01 August 2010
CreatorsNath, Rajib Kumar
PublisherTrace: Tennessee Research and Creative Exchange
Source SetsUniversity of Tennessee Libraries
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMasters Theses

Page generated in 0.0024 seconds