Global ETD Search

1	Providing Support for the Movidius Myriad1 Platform in the SkePU Skeleton Programming Framework Cuello, Rosandra January 2014 (has links) The Movidius Myriad1 Platform is a multicore embedded platform primed to offer high performance and power efficiency for computer vision applications in mobile devices. The challenges of programming multicore environments are well known and skeleton programming offers a high-level programming alternative for parallel computing, intended to hide the complexities of the system from the programmer. The SkePU Skeleton Programming Framework includes backend implementations for CPU and GPU systems and it has the capacity to support more platforms by extending its backend implementations. With this master thesis project we aim to extend the SkePU Skeleton Programming Framework to provide support for execution in the Movidius Myriad1 embedded platform. Our SkePU backend for Myriad1 consists on a set of macros and functions to compose the different elements of a Myriad1 application, data communication structures to exchange data between the host systems and Myriad1, and a helper script and auxiliary files to generate a Myriad1 application.Evaluation and testing demonstrate that our backend is usable, however further optimizations are needed to obtain good performance that would make it practical to use in real life applications, particularly when it comes to data communication. As part of this project, we have outlined some improvements that could be applied to obtain better performance overall in the future, addressing the issues found with the methods of data communication. skeleton programming parallel programming multicore programming skepu
2	Adaptation of algorithms for underwater sonar data processing to GPU-based systems Sundin, Patricia January 2013 (has links) In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms' performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described in existing literature for GPU processing (e.g. usage of shared memory, coalesced memory accesses), are implemented and multiple versions of each algorithm are created to study their trade-offs. Evaluation on two GPUs showed that different versions of the same algorithm have different performance characteristic and execution with the best performing version can give better performance than the original algorithm executing on 8 CPUs. A performance comparison between CUDA, OpenCL and SkePU versions of one algorithm is also made. GPU processing CUDA OpenCL SkePU Performance optimization Underwater sonar data processing Computer Engineering Datorteknik
3	Skeleton Programming for Heterogeneous GPU-based Systems Dastgeer, Usman January 2011 (has links) In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems. We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform. To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another. Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed. SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct. Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work. Skeleton programming GPU programming SkePU performance portability Computer Sciences Datavetenskap (datalogi)
4	Modernizing and Evaluating the Autotuning Framework of SkePU 3 Nsralla, Basel January 2022 (has links) Autotuning is a method which enables a program to automatically choose the most suitable parameters that optimizes it for a certain goal e.g. speed, cost, etc. In this work autotuning is implemented in the context of the SkePU framework, in order to choose the best backend (CUDA, CPU, OpenCL, Hybrid) that would optimize a skeleton execution in terms of performance. SkePU is a framework that provides different algorithmic skeletons with implementations for the different backends (OpenCL, CUDA, OpenMP, CPU). Skeletons are parameterised with a user-provided per-element function which will run in parallel. This thesis shows how the autotuning of SkePU’s automatic backend selection for skeleton calls is implemented with respect to all the different parameters that a SkePU skeleton could have. The autotuning in this thesis is built upon the sampling technique, which is implemented by applying different combinations of sizes for the vector and matrix parameters to eventually generate an execution plan, which will be used as a lookup table when running the skeleton on all different backends. The execution plan will estimate the best performing backend for the sample. This work finally evaluates the implementation by comparing the results of running the autotuning on the different SkePU programs, to running the same programs without the autotuning. SkePU Autotuning Parallel Computing Multicore OpenCL OpenMP Computer Sciences Datavetenskap (datalogi)
5	Design and Implementation of a Performance Visualization Tool for the High-Level Parallel Programming Framework SkePU / Design och implementation av ett prestandavisualiseringsverktyg för parallellprogrammeringsramverket SkePU Frankell, Elin January 2024 (has links) The rise of parallel programming languages, as a result of processors' flattening clock frequencies, has lead to further use of high-level parallel pattern-based programming framework such as SkePU. SkePU provides a sequential high-level interface connected to different back-ends, or a hybrid of several, which reduces the time required to learn several new programming languages. Although this high-level interface introduces a new problem, an abstraction layer from the user to what code is being executed. To make SkePU programs easier to analyze this thesis implements a performance visualization tool for SkePU. The conducted literature study found that there currently exists very few performance visualization tools that cater specifically towards skeleton programming languages. This thesis evaluates usability of the implemented tool by conducting a survey and a user study with participants whom are very familiar with SkePU. The choice of evaluation method is in itself critically evaluated, as is the design choices made, and results are presented with an accompanying discussion as to how those results were derived. SkePU Performance Visualization Tool Skeleton programming Parallel programming languages Computer Sciences Datavetenskap (datalogi)
6	Using Task Parallelism for Distributed Parallel Skeleton Programming : Implementing a StarPU Back-End to SkePU 2 / Distribuerade parallellprogrammeringsskelett genom uppgiftsparallellism : Implementation av en StarPU-baserad SkePU 2 backend Henrik, Henriksson January 2024 (has links) We extended the parallel skeleton programming framework SkePU 2 with a new back-end utilizing StarPU, a task programming framework for hybrid and distributed architectures. The aim was to allow SkePU to run on distributed clusters, using MPI through StarPU. The implemented back-end distributes data and work across participating ranks. While we did not implement the full SkePU API, the Map and Reduce1D skeletons were successfully implemented. During the implementation, we discovered some differences in API design between SkePU and StarPU. We combine the type-safe templates used in the SkePU API with the C-style void*-heavy API of StarPU. This requires the implementation to use more complex templates than normally desired. While we could preserve most of the SkePU 2 API when moving to a distributed memory situation, some parts had to change. In particular, we needed to change the semantics of SkePU 2 containers with regards to iterators and random access. We benchmarked the performance of the implemented back-end against an MPI+OpenMP reference implementation on two problems, n-body and a simple reduction. While the n-body problem demonstrates promising scaling properties, reductions do not scale well to larger number of ranks. A performance comparison against the MPI+OpenMP reference implementation reveals that, aside from the higher communication overhead, there may also be some overhead in the work performed between communications, potentially performing at below 60-70% of the reference. In most cases, the new back-end to SkePU exhibits significantly lower performance than the reference. Extending the implemented solution to cover the full API and improving performance could provide a high level interface to distributed programming for application programmers. Indeed, subsequent developments of SkePU 3 extend and improve our StarPU back-end. HPC StarPU SkePU parallel porgramming skeleton programming distributed systems MPI Computer Engineering Datorteknik
7	Extension of the SkePU Skeleton ProgrammingFramework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters Mangaraj, Swadhin K January 2013 (has links) SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linköpings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, parallel CPUs using MPI and OpenMP or GPUs using CUDA and OpenCL. SkePU is comprised of seven data-parallel skeletons and one task-parallel skeleton and these skeletons use two types of containers: vector and matrix to model real-life parallel applications. In this thesis, we address the extension of the SkePU framework by extending the matrix container (which stores 2-D data values) that can efficiently use the existing skeletons to develop parallel scientific applications on large-scale clusters using MPI. This piece of work focuses on the distribution of the matrix among the participating processes which after receiving their share of data can execute the application in parallel. This work covers all of the seven data-parallel skeletons. Each skeleton has been tested with a small application program. In addition to measurement of performance improvement from the application program’s execution time, we have also done a communication cost analysis for all skeletons with MPI using the LogGP model. In order to evaluate and test the operational efficiency of the extension, we have considered a PDE solver application. Through this application, we have demonstrated the performance gain and scalability of the extended framework. The performance improvement was more when computational load dominates the memory I/O operations. The results show that using the extension can serve as a viable approach while implementing real-life parallel applications on large-scale clusters. SkePU Skeleton Programming Framework C++ template library Matrix container
8	Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU Öhberg, Tomas January 2018 (has links) The trend in computer architectures has for several years been heterogeneous systems consisting of a regular CPU and at least one additional, specialized processing unit, such as a GPU.The different characteristics of the processing units and the requirement of multiple tools and programming languages makes programming of such systems a challenging task. Although there exist tools for programming each processing unit, utilizing the full potential of a heterogeneous computer still requires specialized implementations involving multiple frameworks and hand-tuning of parameters.To fully exploit the performance of heterogeneous systems for a single computation, hybrid execution is needed, i.e. execution where the workload is distributed between multiple, heterogeneous processing units, working simultaneously on the computation. This thesis presents the implementation of a new hybrid execution backend in the algorithmic skeleton framework SkePU. The skeleton framework already gives programmers a user-friendly interface to algorithmic templates, executable on different hardware using OpenMP, CUDA and OpenCL. With this extension it is now also possible to divide the computational work of the skeletons between multiple processing units, such as between a CPU and a GPU. The results show an improvement in execution time with the hybrid execution implementation for all skeletons in SkePU. It is also shown that the new implementation results in a lower and more predictable execution time compared to a dynamic scheduling approach based on an earlier implementation of hybrid execution in SkePU. Heterogeneous computing Hybrid execution Skeleton programming Workload partitioning SkePU Skeleton CPU GPU Accelerator Computer Sciences Datavetenskap (datalogi)
9	Sparse-Matrix support for the SkePU library for portable CPU/GPU programming Sharma, Vishist January 2016 (has links) In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the number of zeroes is relatively large then a requirement for more efficient data structure arises. Beyond the sparse matrix representation, we propose an algorithm to judge the condition where computation on sparse matrices is more beneficial in terms of execution time for an ongoing computation and to adapt a matrix's state accordingly, which is the main concern of this thesis work. We present and implement an approach to switch automatically between two data container types dynamically inside the SkePU framework for a multi-core GPU-based heterogeneous system. The new sparse matrix data container supports all SkePU skeletons and nearly all SkePU operations. We provide compression and decompression algorithms from dense matrix to sparse matrix and vice versa on CPU and GPUs using SkePU data parallel skeletons. We have also implemented a context aware switching mechanism in order to switch between two data container types on the CPU or the GPU. A multi-state matrix representation, and selection on demand is also made possible. In order to evaluate and test effectiveness and efficiency of our extension to the SkePU framework, we have considered Matrix-Vector Multiplication as our benchmark program because iterative solvers like Conjugate Gradient and Generalized Minimum Residual use Sparse Matrix-Vector Multiplication as their basic operation. Through our benchmark program we have demonstrated adaptive switching between two data container types, implementation selection between CUDA and OpenMP, and converting the data structure depending on the density of non-zeroes in a matrix. Our experiments on GPU-based architectures show that our automatic switching mechanism adapts with the fastest SkePU implementation variant, and has a limited training cost. Sparse Matrix SkePU auto tuning CPU/GPU programming conversion function profile guided composition Computer Sciences Datavetenskap (datalogi)
10	SkePU 2: Language Embedding and Compiler Support for Flexible and Type-Safe Skeleton Programming Ernstsson, August January 2016 (has links) This thesis presents SkePU 2, the next generation of the SkePU C++ framework for programming of heterogeneous parallel systems using the skeleton programming concept. SkePU 2 is presented after a thorough study of the state of parallel programming models, frameworks and tools, including other skeleton programming systems. The advancements in SkePU 2 include a modern C++11 foundation, a native syntax for skeleton parameterization with user functions, and an entirely new source-to-source translator based on Clang compiler front-end libraries. SkePU 2 extends the functionality of SkePU 1 by embracing metaprogramming techniques and C++11 features, such as variadic templates and lambda expressions. The results are improved programmability and performance in many situations, as shown in both a usability survey and performance evaluations on high-performance computing hardware. SkePU’s skeleton programming model is also extended with a new construct, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We conclude that SkePU 2 is a promising new direction for the SkePU project, and a solid basis for future work, for example in performance optimization. Skeleton programming SkePU Source-to-source transformation C++11 Heterogeneous parallel systems Portability Computer Sciences Datavetenskap (datalogi)

Search results