In an ideal world, scientific applications would be expressed as high-level compositions of abstractions that encapsulate parallelism and deliver near-optimal performance with low maintainability costs. The alternative, where such abstractions are unavailable, is for application programmers to control execution using an appropriate explicitly parallel programming model. In this thesis we explore both approaches, represented by the Firedrake framework and the OpenMP programming model respectively. We also explore how OpenMP can support high level abstractions such as Firedrake. Firedrake is designed as a composition of domain-specific abstractions for solving partial differential equations via the finite element method. We extend Firedrake with support for extruded meshes frequently used in geophysical simulations. We introduce algorithms for numbering and iterating over any discretization supported by an extruded mesh. Starting with version 4.0, OpenMP computations, previously intended exclusively for the CPU, can be offloaded to accelerators and coprocessors. We introduce code generation schemes for offloading single and nested OpenMP parallel constructs in the CLANG/LLVM toolchain. The schemes map OpenMP directives to the hardware model of the accelerator enabling the programmer to use OpenMP in a prescriptive way. Performance is evaluated on the extruded mesh extensions to Firedrake as well as on LULESH, a widely ported proxy application intended to be representative of an important portion of Department of Energy’s scientific codes. In the case of Firedrake, performance is shown to reach significant percentages of theoretical hardware limits. For OpenMP, the runtime is compared against hand-optimized implementations employing the accelerator-specific CUDA C/C++ language extensions. The additions to the Firedrake framework combine both approaches into a single toolchain containing a newly introduced OpenMP 4.0 Firedrake backend with functionality equivalent to all existing Firedrake backends. OpenMP 4.0 is used as a single representation for both CPU and GPU platforms thus simplifying the application of target-specific optimisations. The OpenMP 4.0 backend improves maintainability through code reuse and will deliver gains in portability as offloading support in CLANG advances.
|Contributors||Kelly, Paul ; Ham, David|
|Publisher||Imperial College London|
|Source Sets||Ethos UK|
|Type||Electronic Thesis or Dissertation|
Page generated in 0.008 seconds