Global ETD Search

Return to search

Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices

Computing systems have become heterogeneous with the increasing prevalence of multi-core CPUs, Graphics Processing Units (GPU) and other accelerators in them. OpenCL has emerged as an attractive programming framework for heterogeneous systems. However, utilizing mul- tiple devices in OpenCL is a challenge as it requires the programmer to explicitly map data and computation to each device. Utilizing multiple devices simultaneously to speed up execu- tion of a kernel is even more complex, as the relative execution time of the kernel on different devices can vary signiﬁcantly. Also, after each kernel execution, a coherent version of the data needs to be established. This means that, in order to utilize all devices effectively, the programmer has to spend considerable time and effort to distribute work across all devices, keep track of modiﬁed data in these devices and correctly perform a merging step to put the data together. Further, the relative performance of a program may vary across different inputs, which means a statically determined work distribution may not work well.
In this work, we present FluidiCL, an OpenCL runtime that takes a program written for a single device and uses multiple heterogeneous devices to execute each kernel. The runtime performs dynamic work distribution and cooperatively executes each kernel on all available devices. Since we consider a setup with devices having discrete address spaces, our solution ensures that execution of OpenCL work-groups on devices is adjusted by taking into account the overheads for data management. The data transfers and data merging needed to ensure coherence are handled transparently without requiring any effort from the programmer. Flu- idiCL also does not require prior training or proﬁling and is completely portable across dif- ferent machines. Because it is dynamic, the runtime is able to adapt to system load. We have developed several optimizations for improving the performance of FluidiCL. We evaluate the runtime across different sets of devices. On a machine with an Intel quad-core processor and an NVidia Fermi GPU, FluidiCL shows a geomean speedup of nearly 64% over the GPU, 88% over the CPU and 14% over the best of the two devices in each benchmark. In all benchmarks, performance of our runtime comes to within 13% of the best of the two devices. FluidiCL shows similar results on a machine with a quad-core CPU and an NVidia Kepler GPU, with up to 26% speedup over the best of the two. We also present results considering an Intel Xeon Phi accelerator and a CPU and ﬁnd that FluidiCL performs up to 45% faster than the best of the two devices. We extend FluidiCL from a CPU–GPU scenario to a three-device setup hav- ing a quad-core CPU, an NVidia Kepler GPU and an Intel Xeon Phi accelerator and ﬁnd that FluidiCL obtains a geomean improvement of 6% in kernel execution time over the best of the three devices considered in each case.

Heterogeneous Computers

Open Computing Language

FluidiCL

Fluidic Kernels

OpenCL Application Programming Interface

Graphics Processing Unit (GPU)

Central Processing Unit (CPU)

Computer Architecture

FluidiCL Runtime

Heterogeneous OpenCL Runtime

OpenCL Programs

CPU–GPU Systems

Computer Engineering

Identifer	oai:union.ndltd.org:IISc/oai:etd.iisc.ernet.in:2005/3468
Date	January 2013
Creators	Pandit, Prasanna Vasant
Contributors	Govindarajan, R
Source Sets	India Institute of Science
Language	en_US
Detected Language	English
Type	Thesis
Relation	G25888

Page generated in 0.0027 seconds

Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices

Description

Links & Downloads

Tags

Additional Fields