Global ETD Search

Return to search

Unstructured Computations on Emerging Architectures

This dissertation describes detailed performance engineering and optimization of an unstructured computational aerodynamics software system with irregular memory accesses on various multi- and many-core emerging high performance computing scalable architectures, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic- and architecture-oriented optimizations are essential for achieving worthy performance.
We investigate several state-of-the-practice shared-memory optimization techniques applied to key kernels for the important problem class of unstructured meshes. We illustrate for a broad spectrum of emerging microprocessor architectures as representatives of the compute units in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating-point numerics of the original code. While the linear algebraic kernels are bottlenecked by memory bandwidth for even modest numbers of hardware cores sharing a common address space, the edge-based loop kernels, which arise in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, are compute-intensive and effectively exploit contemporary multi- and many-core processing hardware. We therefore employ low- and high-level algorithmic- and architecture-specific code optimizations and tuning in light of thread- and data-level parallelism, with a focus on strong thread scaling at the node-level. Our approaches are based upon novel multi-level hierarchical workload distribution mechanisms of data across different compute units (from the address space down to the registers) within every hardware core. We analyze the demonstrated aerodynamics application on specific computing architectures to develop certain performance metrics and models to bespeak the upper and lower bounds of the performance. We present significant full application speedup relative to the baseline code, on a succession of many-core processor architectures, i.e., Intel Xeon Phi Knights Corner (5.0x) and Knights Landing (2.9x). In addition, the performance of Knights Landing outperforms, at significantly lower power consumption, Intel Xeon Skylake with nearly twofold speedup.
These optimizations are expected to be of value for many other unstructured mesh partial differential equation-based scientific applications as multi- and many- core architecture evolves.

Performance Optimizations

Thread-level parallelism

Data-level parallelism

Unstructured Grids

Computational Aerodynamics

Intel Xeon Phi

Identifer	oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/644902
Date	05 May 2019
Creators	Al Farhan, Mohammed
Contributors	Keyes, David E., Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Moshkov, Mikhail, Hadwiger, Markus, Bagci, Hakan, Chow, Edmond
Source Sets	King Abdullah University of Science and Technology
Language	English
Detected Language	English
Type	Dissertation

Page generated in 0.0029 seconds

Unstructured Computations on Emerging Architectures

Description

Links & Downloads

Tags

Additional Fields