Global ETD Search

1	Multi-level Parallelism with MPI and OpenACC for CFD Applications McCall, Andrew James 14 June 2017 (has links) High-level parallel programming approaches, such as OpenACC, have recently become popular in complex fluid dynamics research since they are cross-platform and easy to implement. OpenACC is a directive-based programming model that, unlike low-level programming models, abstracts the details of implementation on the GPU. Although OpenACC generally limits the performance of the GPU, this model significantly reduces the work required to port an existing code to any accelerator platform, including GPUs. The purpose of this research is twofold: to investigate the effectiveness of OpenACC in developing a portable and maintainable GPU-accelerated code, and to determine the capability of OpenACC to accelerate large, complex programs on the GPU. In both of these studies, the OpenACC implementation is optimized and extended to a multi-GPU implementation while maintaining a unified code base. OpenACC is shown as a viable option for GPU computing with CFD problems. In the first study, a CFD code that solves incompressible cavity flows is accelerated using OpenACC. Overlapping communication with computation improves performance for the multi-GPU implementation by up to 21%, achieving up to 400 times faster performance than a single CPU and 99% weak scalability efficiency with 32 GPUs. The second study ports the execution of a more complex CFD research code to the GPU using OpenACC. Challenges using OpenACC with modern Fortran are discussed. Three test cases are used to evaluate performance and scalability. The multi-GPU performance using 27 GPUs is up to 100 times faster than a single CPU and maintains a weak scalability efficiency of 95%. / Master of Science / The research and analysis performed in scientific computing today produces an ever-increasing demand for faster and more energy efficient performance. Parallel computing with supercomputers that use many central processing units (CPUs) is the current standard for satisfying these demands. The use of graphics processing units (GPUs) for scientific computing applications is an emerging technology that has gained a lot of popularity in the past decade. A single GPU can distribute the computations required by a program over thousands of processing units. This research investigates the effectiveness of a relatively new standard, called OpenACC, for offloading execution of a program to the GPU. The most widely used standards today are highly complex and require low-level, detailed knowledge of the GPU’s architecture. These issues significantly reduce the maintainability and portability of a program. OpenACC does not require rewriting a program for the GPU. Instead, the developer annotates regions of code to run on the GPU and only has to denote high-level information about how to parallelize the code. The results of this research found that even for a complex program that models air flows, using OpenACC to run the program on 27 GPUs increases performance by a factor of 100 over a single CPU and by a factor of 4 over 27 CPUs. Although higher performance is expected with other GPU programming standards, these results were accomplished with minimal change to the original program. Therefore, these results demonstrate the ability of OpenACC to improve performance while keeping the program maintainable and portable. Graphics processing unit Directive-based programming OpenACC Lid-driven cavity Multi-GPU Parallel computing
2	Evaluating the OpenACC API for Parallelization of CFD Applications Pickering, Brent Phillip 06 September 2014 (has links) Directive-based programming of graphics processing units (GPUs) has recently appeared as a viable alternative to using specialized low-level languages such as CUDA C and OpenCL for general-purpose GPU programming. This technique, which uses directive or pragma statements to annotate source codes written in traditional high-level languages, is designed to permit a unified code base to serve multiple computational platforms and to simplify the transition of legacy codes to new architectures. This work analyzes the popular OpenACC programming standard, as implemented by the PGI compiler suite, in order to evaluate its utility and performance potential in computational fluid dynamics (CFD) applications. Of particular interest is the handling of stencil algorithms, which are an important component of finite-difference and finite-volume numerical methods. To this end, the process of applying the OpenACC Fortran API to a preexisting finite-difference CFD code is examined in detail, and all modifications that must be made to the original source in order to run efficiently on the GPU are noted. Optimization techniques for OpenACC are also explored, and it is demonstrated that tuning the code for a particular accelerator architecture can result in performance increases of over 30%. There are also some limitations and programming restrictions imposed by the API: it is observed that certain useful features of modern Fortran (2003/8) are effectively disabled within OpenACC regions. Finally, a combination of OpenACC and OpenMP directives is used to create a truly cross-platform Fortran code that can be compiled for either CPU or GPU hardware. The performance of the OpenACC code is measured on several contemporary NVIDIA GPU architectures, and a comparison is made between double and single precision arithmetic showing that if reduced precision can be tolerated, it can lead to significant speedups. To assess the performance gains relative to a typical CPU implementation, the execution time for a standard benchmark case (lid-driven cavity) is used as a reference. The OpenACC version is compared against the identical Fortran code recompiled to use OpenMP on multicore CPUs, as well as a highly-optimized C++ version of the code that utilizes hardware aware programming techniques to attain higher performance on the Intel Xeon platforms being tested. Low-level optimizations specific to these architectures are analyzed and it is observed that the stencil access pattern required by the structured-grid CFD code sometimes leads to performance degrading conflict misses in the hardware managed CPU caches. The GPU code, which primarily uses software managed caching, is found to be free from these issues. Overall, it is observed that the OpenACC GPU code compares favorably against even the best optimized CPU version: using a single NVIDIA K20x GPU, the Fortran+OpenACC code is seen to outperform the optimized C++ version by 20% and the Fortran+OpenMP version by more than 100% with both CPU codes running on a 16-core Xeon workstation. / Master of Science graphics processing unit (GPU) computational fluid dynamics (CFD) directive-based programming parallel programming OpenACC Fortran 2003 stencil code finite-volume method

Search results

Multi-level Parallelism with MPI and OpenACC for CFD Applications

Evaluating the OpenACC API for Parallelization of CFD Applications