1 |
Translating C/C++ applications to a task-based representationLi, Lu January 2011 (has links)
GPU-based heterogeneous architectures have been given much attention recently. How to get optimal performance out of those architectures with affordable programming effort remains a complex challenge. The PEPPHER framework is one possible solution. Within the PEPPHER framework, the StarPU run-time system is used to decrease such programming efforts, and at the same time to ensure near optimal performance by efficient scheduling over such architectures. However, adapting a normal C/C++ application to the StarPU runtime system requires additional programming effort. This thesis implements and tests a composition tool for automatic adaptation of normal C/C++ applications withPEPPHER components to StarPU. This composition tool requires XML annotation for applications and several trivial changes to applications, which take limited efforts. Our results obtained by three test cases (vector scale, sorting, andmatrix multiplication) show that automatic adaptation works well on different platforms that StarPU supports. It is also shown that besides StarPU’s dynamic composition, this tool facilitates static composition to improve performance portability of normal C/C++ applications.
|
2 |
Automated Runtime Analysis and Adaptation for Scalable Heterogeneous ComputingHelal, Ahmed Elmohamadi Mohamed 29 January 2020 (has links)
In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process.
This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications. / Doctor of Philosophy / Current supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources.
To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.
|
Page generated in 0.1005 seconds