Modern Graphics Processing Units (GPUs) are providing breakthrough performance for numerical computing at the cost of increased programming complexity. Current programming models for GPUs require that the programmer manually manage the data transfer between CPU and GPU. This thesis proposes a simpler programming model and introduces a new compilation framework to enable Python applications containing numerical computations to be executed on GPUs and multi-core CPUs.
The new programming model minimally extends Python to include type and parallel-loop annotations. Our compiler framework then automatically identifies the data to be transferred between the main memory and the GPU for a particular class of affine array accesses. The compiler also automatically performs loop transformations to improve performance on GPUs.
For kernels with regular loop structure and simple memory access patterns, the GPU code generated by the compiler achieves significant performance improvement over multi-core CPU codes.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/762 |
Date | 11 1900 |
Creators | Garg, Rahul |
Contributors | Amaral, Jose Nelson (Computing Science), Lu, Paul (Computing Science), Cockburn, Bruce (Electrical and Computer Engineering) |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Detected Language | English |
Type | Thesis |
Format | 1862977 bytes, application/pdf |
Page generated in 0.0018 seconds