Global ETD Search

Return to search

Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach

Dense linear algebra(DLA) is one of the most seven important kernels in
high performance computing. The introduction of new machines from vendors
provides us opportunities to optimize DLA libraries for the new machines
and thus exploit their power. Unfortunately the optimization phase is not
straightforward. The optimum code of a certain Basic Linear Algebra
Subprogram (BLAS) kernel, which is the core of DLA algorithms, in two
different machines with different semiconductor process can be different
even if they share the same features in terms of instruction set
architecture, memory hierarchy and clock speed. It has become a tradition
to optimize BLAS for new machines. Vendors maintain highly optimized BLAS
libraries targeting their CPUs. Unfortunately the existing BLAS for GPUs
is not highly optimized for DLA algorithms. In my research, I have
provided new algorithms for several important BLAS kernels for different
generation of GPUs and introduced a pointer redirecting approach to make
BLAS run faster in generic problem size. I have also presented an
auto-tuning approach to parameterize the developed BLAS algorithms and
select the best set of parameters for a given card.
The hardware trends have also brought up the need for updates on existing
legacy DLA software packages, such as the sequential LAPACK. To take
advantage of the new computational environment, successors of LAPACK must
incorporate algorithms of three main characteristics: high parallelism,
reduced communication, and heterogeneity-awareness. On multicore
architectures, Parallel Linear Algebra Software for Multicore
Architectures (PLASMA) has been developed to meet the challenges in
multicore. On the other extreme, Matrix Algebra on GPU and Multicore
Architectures (MAGMA) library demonstrated a hybridization approach that
indeed streamlined the development of high performance DLA for multicores
with GPU accelerators. The performance of these two libraries depend upon
right choice of parameters for a given problem size and given number of
cores and/or GPUs. In this work, the issue of automatically tuning these
two libraries is presented. A prune based empirical auto-tuning method has
been proposed for tuning PLASMA. Part of the tuning method for PLASMA was
considered to tune hybrid MAGMA library.

http://trace.tennessee.edu/utk_gradthes/734

Identifer	oai:union.ndltd.org:UTENN/oai:trace.tennessee.edu:utk_gradthes-1794
Date	01 August 2010
Creators	Nath, Rajib Kumar
Publisher	Trace: Tennessee Research and Creative Exchange
Source Sets	University of Tennessee Libraries
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses

Page generated in 0.0024 seconds

Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach

Description

Links & Downloads

Tags

Additional Fields