Return to search

Analysis and Implementation Considerations of Krylov Subspace Methods on Modern Heterogeneous Computing Architectures

Krylov subspace methods are the state-of-the-art iterative algorithms for solving large, sparse systems of equations, which are ubiquitous throughout scientific computing. Even with Krylov methods, these problems are often infeasible to solve on standard workstation computers and must be solved instead on supercomputers. Most modern supercomputers fall into the category of “heterogeneous architectures”, typically meaning a combination of CPU and GPU processors. Thus, development and analysis of Krylov subspace methods on these heterogeneous architectures is of fundamental importance to modern scientific computing.
This dissertation focuses on how this relates to several specific problems. Thefirst analyzes the performance of block GMRES (BGMRES) compared to GMRES for linear systems with multiple right hand sides (RHS) on both CPUs and GPUs, and modelling when BGMRES is most advantageous over GMRES on the
GPU. On CPUs, the current paradigm is that if one wishes to solve a system of equations with multiple RHS, BGMRES can indeed outperform GMRES, but not always. Our original goal was to see if there are some cases for which BGMRES
is slower in execution time on the CPU than GMRES on the CPU, while on the GPU, the reverse holds. This is true, and we generally observe much faster execution times and larger improvements in the case of BGMRES on the GPU. We
also observe that for any fixed matrix, when the number of RHS increase, there is a point in which the improvements start to decrease and eventually any advantage of the (unrestarted) block method is lost. We present a new computational model which helps us explain why this is so. The significance of this analysis is that it first demonstrates increased potential of block Krylov methods on heterogeneous architectures than on previously studied CPU-only machines. Moreover, the theoretical runtime model can be used to identify an optimal partitioning strategy of the RHS
for solving systems with many RHS.
The second problem studies the s-step GMRES method, which is an implementation of GMRES that attains high performance on modern heterogeneous machines by generating s Krylov basis vectors per iteration, and then orthogonalizing the vectors in a block-wise fashion. The use of s-step GMRES is currently limited because the algorithm is prone to numerical instabilities, partially due to breakdowns in a tall-and-skinny QR subroutine. Further, a conservatively small step size must be used in practice, limiting the algorithm’s performance. To address these issues, first a novel randomized tall-and-skinny QR factorization is presented that is significantly more stable than the current practical algorithms without sacrificing performance on GPUs. Then, a novel two-stage block orthogonalization scheme is introduced that significantly improves the performance of the s-step GMRES algorithm when small step sizes are used. These contributions help make s-step GMRES a more practical method in heterogeneous, and therefore exascale, environments. / Mathematics

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/10221
Date05 1900
CreatorsHiggins, Andrew, 0009-0007-5527-9263
ContributorsSzyld, Daniel, Seibold, Benjamin, Queisser, Gillian, Boman, Erik G.
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format193 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/10183, Theses and Dissertations

Page generated in 0.0022 seconds