• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • Tagged with
  • 10
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Providing Support for the Movidius Myriad1 Platform in the SkePU Skeleton Programming Framework

Cuello, Rosandra January 2014 (has links)
The Movidius Myriad1 Platform is a multicore embedded platform primed to offer high performance and power efficiency for computer vision applications in mobile devices. The challenges of programming multicore environments are well known and skeleton programming offers a high-level programming alternative for parallel computing, intended to hide the complexities of the system from the programmer. The SkePU Skeleton Programming Framework includes backend implementations for CPU and GPU systems and it has the capacity to support more platforms by extending its backend implementations. With this master thesis project we aim to extend the SkePU Skeleton Programming Framework to provide support for execution in the Movidius Myriad1 embedded platform. Our SkePU backend for Myriad1 consists on a set of macros and functions to compose the different elements of a Myriad1 application, data communication structures to exchange data between the host systems and Myriad1, and a helper script and auxiliary files to generate a Myriad1 application.Evaluation and testing demonstrate that our backend is usable, however further optimizations are needed to obtain good performance that would make it practical to use in real life applications, particularly when it comes to data communication. As part of this project, we have outlined some improvements that could be applied to obtain better performance overall in the future, addressing the issues found with the methods of data communication.
2

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

Sundin, Patricia January 2013 (has links)
In this master thesis, algorithms for acoustic simulations in underwater environments are ported for GPU processing. The GPU parallel computing platforms used are CUDA, OpenCL and SkePU. The purpose of this master thesis is to adapt and evaluate the ported algorithms' performance on two modern NVIDIA GPUs, Tesla K20 and Quadro K5000. Several optimizations, described in existing literature for GPU processing (e.g. usage of shared memory, coalesced memory accesses), are implemented and multiple versions of each algorithm are created to study their trade-offs. Evaluation on two GPUs showed that different versions of the same algorithm have different performance characteristic and execution with the best performing version can give better performance than the original algorithm executing on 8 CPUs. A performance comparison between CUDA, OpenCL and SkePU versions of one algorithm is also made.
3

Skeleton Programming for Heterogeneous GPU-based Systems

Dastgeer, Usman January 2011 (has links)
In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems. We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform. To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another. Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed. SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct. Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work.
4

Modernizing and Evaluating the Autotuning Framework of SkePU 3

Nsralla, Basel January 2022 (has links)
Autotuning is a method which enables a program to automatically choose the most suitable parameters that optimizes it for a certain goal e.g. speed, cost, etc. In this work autotuning is implemented in the context of the SkePU framework, in order to choose the best backend (CUDA, CPU, OpenCL, Hybrid) that would optimize a skeleton execution in terms of performance. SkePU is a framework that provides different algorithmic skeletons with implementations for the different backends (OpenCL, CUDA, OpenMP, CPU). Skeletons are parameterised with a user-provided per-element function which will run in parallel. This thesis shows how the autotuning of SkePU’s automatic backend selection for skeleton calls is implemented with respect to all the different parameters that a SkePU skeleton could have. The autotuning in this thesis is built upon the sampling technique, which is implemented by applying different combinations of sizes for the vector and matrix parameters to eventually generate an execution plan, which will be used as a lookup table when running the skeleton on all different backends. The execution plan will estimate the best performing backend for the sample. This work finally evaluates the implementation by comparing the results of running the autotuning on the different SkePU programs, to running the same programs without the autotuning.
5

Extension of the SkePU Skeleton ProgrammingFramework for Multi-core CPU and Multi-GPU Systems for MPI-based Clusters

Mangaraj, Swadhin K January 2013 (has links)
SkePU (Skeleton Programming Framework for Multi-core CPU and Multi-GPU Systems) is a parallel computing framework developed by Johan Enmyren and Christoph Kessler at Linköpings Universitet. This C++ template library provides a simple and unified interface for specifying data-parallel computations with the help of skeletons and is targeted to multiple backends e.g. for a sequential CPU, parallel CPUs using MPI and OpenMP or GPUs using CUDA and OpenCL. SkePU is comprised of seven data-parallel skeletons and one task-parallel skeleton and these skeletons use two types of containers: vector and matrix to model real-life parallel applications. In this thesis, we address the extension of the SkePU framework by extending the matrix container (which stores 2-D data values) that can efficiently use the existing skeletons to develop parallel scientific applications on large-scale clusters using MPI. This piece of work focuses on the distribution of the matrix among the participating processes which after receiving their share of data can execute the application in parallel. This work covers all of the seven data-parallel skeletons. Each skeleton has been tested with a small application program. In addition to measurement of performance improvement from the application program’s execution time, we have also done a communication cost analysis for all skeletons with MPI using the LogGP model. In order to evaluate and test the operational efficiency of the extension, we have considered a PDE solver application. Through this application, we have demonstrated the performance gain and scalability of the extended framework. The performance improvement was more when computational load dominates the memory I/O operations. The results show that using the extension can serve as a viable approach while implementing real-life parallel applications on large-scale clusters.
6

Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU

Öhberg, Tomas January 2018 (has links)
The trend in computer architectures has for several years been heterogeneous systems consisting of a regular CPU and at least one additional, specialized processing unit, such as a GPU.The different characteristics of the processing units and the requirement of multiple tools and programming languages makes programming of such systems a challenging task. Although there exist tools for programming each processing unit, utilizing the full potential of a heterogeneous computer still requires specialized implementations involving multiple frameworks and hand-tuning of parameters.To fully exploit the performance of heterogeneous systems for a single computation, hybrid execution is needed, i.e. execution where the workload is distributed between multiple, heterogeneous processing units, working simultaneously on the computation. This thesis presents the implementation of a new hybrid execution backend in the algorithmic skeleton framework SkePU. The skeleton framework already gives programmers a user-friendly interface to algorithmic templates, executable on different hardware using OpenMP, CUDA and OpenCL. With this extension it is now also possible to divide the computational work of the skeletons between multiple processing units, such as between a CPU and a GPU. The results show an improvement in execution time with the hybrid execution implementation for all skeletons in SkePU. It is also shown that the new implementation results in a lower and more predictable execution time compared to a dynamic scheduling approach based on an earlier implementation of hybrid execution in SkePU.
7

Sparse-Matrix support for the SkePU library for portable CPU/GPU programming

Sharma, Vishist January 2016 (has links)
In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the number of zeroes is relatively large then a requirement for more efficient data structure arises. Beyond the sparse matrix representation, we propose an algorithm to judge the condition where computation on sparse matrices is more beneficial in terms of execution time for an ongoing computation and to adapt a matrix's state accordingly, which is the main concern of this thesis work. We present and implement an approach to switch automatically between two data container types dynamically inside the SkePU framework for a multi-core GPU-based heterogeneous system. The new sparse matrix data container supports all SkePU skeletons and nearly all SkePU operations. We provide compression and decompression algorithms from dense matrix to sparse matrix and vice versa on CPU and GPUs using SkePU data parallel skeletons. We have also implemented a context aware switching mechanism in order to switch between two data container types on the CPU or the GPU. A multi-state matrix representation, and selection on demand is also made possible. In order to evaluate and test effectiveness and efficiency of our extension to the SkePU framework, we have considered Matrix-Vector Multiplication as our benchmark program because iterative solvers like Conjugate Gradient and Generalized Minimum Residual use Sparse Matrix-Vector Multiplication as their basic operation. Through our benchmark program we have demonstrated adaptive switching between two data container types, implementation selection between CUDA and OpenMP, and converting the data structure depending on the density of non-zeroes in a matrix. Our experiments on GPU-based architectures show that our automatic switching mechanism adapts with the fastest SkePU implementation variant, and has a limited training cost.
8

SkePU 2: Language Embedding and Compiler Support for Flexible and Type-Safe Skeleton Programming

Ernstsson, August January 2016 (has links)
This thesis presents SkePU 2, the next generation of the SkePU C++ framework for programming of heterogeneous parallel systems using the skeleton programming concept. SkePU 2 is presented after a thorough study of the state of parallel programming models, frameworks and tools, including other skeleton programming systems. The advancements in SkePU 2 include a modern C++11 foundation, a native syntax for skeleton parameterization with user functions, and an entirely new source-to-source translator based on Clang compiler front-end libraries. SkePU 2 extends the functionality of SkePU 1 by embracing metaprogramming techniques and C++11 features, such as variadic templates and lambda expressions. The results are improved programmability and performance in many situations, as shown in both a usability survey and performance evaluations on high-performance computing hardware. SkePU’s skeleton programming model is also extended with a new construct, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We conclude that SkePU 2 is a promising new direction for the SkePU project, and a solid basis for future work, for example in performance optimization.
9

Design and evaluation of a plain MPI-based cluster execution backend for the SkePU 3 skeleton programming framework

Zeijlon, Alexander January 2023 (has links)
SkePU 3 is a framework for parallel program execution that uses higher order functions called skeletons, which provide a layer of abstraction between user code and the parallel implementation it provides through its backends. The backend that enables SkePU to run on an HPC cluster has a slowdown of a factor two. This reduces the viability of SkePU as an alternative for HPC, and as such, warrants an investigation. Programs written in SkePU are sequential-looking, single-source C++ programs where skeleton calls can transparently execute on multiple different types of processing units, such as CPU cores, GPUs and clusters, using different backends. In this thesis, a strategy for improving the performance of SkePU on clusters is presented, and with it, the design and implementation of a new cluster backend that is simpler and more closely integrated with the non-cluster SkePU code base. Runtime measurements are made, which show that the new cluster backend sees a relative speedup of about a factor of two, which effectively eliminates the slowdown.
10

Integrating SkePU's algorithmic skeletons with GPI on a cluster / Integrering av SkePUs algoritmiska skelett med GPI på ett cluster

Almqvist, Joel January 2022 (has links)
As processors' clock-speed flattened out in the early 2000s, multi-core processors became more prevalent and so did parallel programming. However this programming paradigm introduces additional complexities, and to combat this, the SkePU framework was created. SkePU does this by offering a single-threaded interface which executes the user's code in parallel in accordance to a chosen computational pattern. Furthermore it allows the user themselves to decide which parallel backend should perform the execution, be it OpenMP, CUDA or OpenCL. This modular approach of SkePU thus allows for different hardware to be used without changing the code, and it currently supports CPUs, GPUs and clusters. This thesis presents a new so-called SkePU-backend made for clusters, using the communication library GPI. It demonstrates that the new backend is able to scale better and handle workload imbalances better than the existing SkePU-cluster-backend. This is achieved despite it performing worse at low node amounts, indicating that it requires less scaling overhead. Its weaknesses are also analyzed, partially from a design point of view, and clear solutions are presented, combined with a discussion as to why they arose in the first place.

Page generated in 0.0268 seconds