Return to search

High Productivity Programming of Dense Linear Algebra on Heterogeneous NUMA Architectures

High-end multicore systems with GPU-based accelerators are now ubiquitous in the hardware landscape. Besides dealing with the nontrivial heterogeneous environ- ment, end users should often take into consideration the underlying memory architec- ture to decrease the overhead of data motion, especially when running on non-uniform memory access (NUMA) platforms. We propose the OmpSs parallel programming model approach using its Nanos++ dynamic runtime system to solve the two challeng- ing problems aforementioned, through 1) an innovative NUMA node-aware scheduling policy to reduce data movement between NUMA nodes and 2) a nested parallelism feature to concurrently exploit the resources available from the GPU devices as well as the CPU host, without compromising the overall performance. Our approach fea- tures separation of concerns by abstracting the complexity of the hardware from the end users so that high productivity can be achieved. The Cholesky factorization is used as a benchmark representative of dense numerical linear algebra algorithms. Superior performance is also demonstrated on the symmetric matrix inversion based on Cholesky factorization, commonly used in co-variance computations in statistics. Performance on a NUMA system with Kepler-based GPUs exceeds that of existing implementations, while the OmpSs-enabled code remains very similar to its original sequential version.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/297194
Date07 1900
CreatorsAlomairy, Rabab M.
ContributorsKeyes, David E., Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Ltaief, Hatem, Moshkov, Mikhail, Shihada, Basem
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights2014-07-30, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2014-07-30.

Page generated in 0.0028 seconds