Global ETD Search

Return to search

Techniques for Managing Irregular Control Flow on GPUs

GPGPU is a highly multithreaded throughput architecture that can deliver high speed-up for regular applications while remaining energy efficient. In recent years, there has been much focus on tuning irregular applications and/or the GPU architecture to achieve similar benefits for irregular applications as well as efforts to extract data parallelism from task parallel applications. In this work we tackle both problems.The first part of this work tackles the problem of Control divergence in GPUs. GPGPUs’ SIMT execution model is ineffective for workloads with irregular control-flow because GPGPUs serialize the execution of divergent paths which lead to thread-level parallelism (TLP) loss. Previous works focused on creating new warps based on the control path threads follow, or created different warps for the different paths, or ran multiple narrower warps in parallel. While all previous solutions showed speedup for irregular workloads, they imposed some performance loss on regular workloads. In this work we propose a more fine-grained approach to exploit intra-warpconvergence: rather than threads executing the same code path, opcode-convergent threadsexecute the same instruction, but with potentially different operands. Based on this new definition we find that divergent control blocks within a warp exhibit substantial opcode convergence. We build a compiler that analyzes divergent blocks and identifies the common streams of opcodes. We modify the GPU architecture so that these common instructions are executed as convergent instructions. Using software simulation, we achieve a 17% speedup over baseline GPGPU for irregular workloads and do not incur any performance loss on regular workloads.In the second part we suggest techniques for extracting data parallelism from irregular, task parallel applications in order to take advantage of the massive parallelism provided by the GPU. Our technique involves dividing each task into multiple sub-tasks each performing less work and touching a smaller memory footprint. Our framework performs a locality-aware scheduling that works on minimizing the memory footprint of each warp (a set of threads performing in lock-step). We evaluate our framework with 3 task-parallel benchmarks and show that we can achieve significant speedups over optimized GPU code.

10.25394/pgs.8041376.v1

Computer Engineering

Computer System Architecture

GPU

compilers

GPGPU

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/8041376
Date	25 June 2020
Creators	Jad Hbeika (5929730)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/Techniques_for_Managing_Irregular_Control_Flow_on_GPUs/8041376

Page generated in 0.0021 seconds

Techniques for Managing Irregular Control Flow on GPUs

Description

Links & Downloads

Tags

Additional Fields