Return to search

Comparative study of parallel programming models for multicore computing

Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasing number of processors per chip. This newarchitecture challenges computer programmers to write code that scales overthese many cores to exploit full computational power of these machines.Shared-memory parallel programming paradigms such as OpenMP and IntelThreading Building Blocks (TBB) are two recognized models that offerhigher level of abstraction, shields programmers from low level detailsof thread management and scales computation over all available resources.At the same time, need for high performance power-ecient computing iscompelling developers to exploit GPGPU computing due to GPU's massivecomputational power and comparatively faster multi-core growth. Thistrend leads to systems with heterogeneous architectures containing multicoreCPUs and one or more programmable accelerators such as programmableGPUs. There exist dierent programming models to program these architecturesand code written for one architecture is often not portable to anotherarchitecture. OpenCL is a relatively new industry standard framework, de-ned by Khronos group, which addresses the portability issue. It oers aportable interface to exploit the computational power of a heterogeneous setof processors such as CPUs, GPUs, DSP processors and other accelerators. In this work, we evaluate the eectiveness of OpenCL for programmingmulti-core CPUs in a comparative case study with two CPU specic stableframeworks, OpenMP and Intel TBB, for ve benchmark applicationsnamely matrix multiply, LU decomposition, image convolution, Pi value approximationand image histogram generation. The evaluation includes aperformance comparison of the three frameworks and a study of the relativeeects of applying compiler optimizations on performance numbers.OpenCL performance on two vendor-dependent platforms Intel and AMD,is also evaluated. Then the same OpenCL code is ported to a modern GPUand its code correctness and performance portability is investigated. Finally,usability experience of coding using the three multi-core frameworksis presented.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-94296
Date January 2013
CreatorsAli, Akhtar
PublisherLinköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Tekniska högskolan
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0025 seconds