Programmers invest extensive development effort to optimize a GPU program to achieve peak performance. Achieving this requires an efficient usage of global memory, and avoiding memory bandwidth underutilization. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this model’s coarse-grained control on a program can make the memory bandwidth utilization even worse compared to the version written in a native GPU languages such as CUDA. We propose an extension to OpenACC in order to reduce the traffic on the memory interconnection network, using a compression method on floating point numbers. We examine our method on six case studies, and achieve up to 1.36X speedup. / Graduate / 0544 / 0984 / ebads67@uvic.ca
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/7284 |
Date | 04 May 2016 |
Creators | Salehi, Ebad |
Contributors | Baniasadi, Amirali |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Rights | Available to the World Wide Web |
Page generated in 0.0018 seconds