• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 157
  • 65
  • 48
  • 12
  • 10
  • 8
  • 6
  • 6
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 383
  • 194
  • 110
  • 105
  • 78
  • 69
  • 68
  • 53
  • 42
  • 40
  • 38
  • 34
  • 31
  • 31
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Kaijsers algoritm för beräkning av Kantorovichavstånd parallelliserad i CUDA

Engvall, Sebastian January 2013 (has links)
This thesis processes the work of developing CPU code and GPU code for Thomas Kaijsers algorithm for calculating the kantorovich distance and the performance between the two is compared. Initially there is a rundown of the algorithm which calculates the kantorovich distance between two images. Thereafter we go through the CPU implementation followed by GPGPU written in CUDA. Then the results are presented. Lastly, an analysis about the results and a discussion with possible improvements is presented for possible future applications.
2

GPU parallelization of the Mishchenko method for solving Fredholm equations of the first kind

Nordström, Johan January 2015 (has links)
Fredholm integral equations of the first kind are known to be ill-posed and may be impossible to solve analytically. A. S. Mishchenko et al. have developed a method to generate numerical solutions to Fredholm equations which occurs in physics. Mischenko's method is a Monte Carlo method which can run in parallel. The purpose of this project was to investigate how a parallel version of the Mishchenko method can be implemented on a Graphics Processing Unit (GPU). The developed program uses the CUDA platform for GPU programming. The conclusion of the project is that it is definitely possible to implement the Mishchenko method on a GPU. However, some properties of the algorithm are not optimal for the GPU. A more thorough analysis of the implementation is needed to get a complete understanding of the performance and the bottlenecks.
3

GPU based IP forwarding

Blomquist, Linus, Engström, Hampus January 2015 (has links)
This thesis was about investigating if it is feasible to implement an IP-forwarding data plane on a GPU. A GPU is energy efficient compared to other more powerful processors on the market today and should in theory be efficient to use for routing purposes. An IP-forwarding data plane consist of several things where we focused on some of the concepts. We have implemented IP-forwarding lookup operations, packet header changes, prioritization between different packets and a traffic shaper to restrict the packet throughput. To test these concepts we implemented a prototype, on a Tegra platform, in CUDA and evaluated its performance. We are able to forward 28 Mpackets/second with a best case latency of 27 µS given local simulated packets. The conclusions we can draw of this thesis work is that using a GPU for IP-forwarding purposes seems like an energy efficient solution compared to other routers on the market today. In the thesis we also tried the concept of only launching the GPU kernel once and let it be infinite which shows promising results for future work.
4

Um estudo do uso eficiente de programas em placas gráficas / A case study on the efficient use of programs on GPUs

Patricia Akemi Ikeda 20 September 2011 (has links)
Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado. / Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
5

Evaluation of Computer Vision Algorithms Optimized for Embedded GPU:s. / Utvärdering av bildbehandlingsalgoritmer optimerade för inbyggda GPU:er

Nilsson, Mattias January 2014 (has links)
The interest of using GPU:s as general processing units for heavy computations (GPGPU) has increased in the last couple of years. Manufacturers such as Nvidia and AMD make GPU:s powerful enough to outrun CPU:s in one order of magnitude, for suitable algorithms. For embedded systems, GPU:s are not as popular yet. The embedded GPU:s available on the market have often not been able to justify hardware changes from the current systems (CPU:s and FPGA:s) to systems using embedded GPU:s. They have been too hard to get, too energy consuming and not suitable for some algorithms. At SICK IVP, advanced computer vision algorithms run on FPGA:s. This master thesis optimizes two such algorithms for embedded GPU:s and evaluates the result. It also evaluates the status of the embedded GPU:s on the market today. The results indicates that embedded GPU:s perform well enough to run the evaluatedd algorithms as fast as needed. The implementations are also easy to understand compared to implementations for FPGA:s which are competing hardware.
6

GPU-Based Acceleration on ACEnet for FDTD Method of Electromagnetic Field Analysis

Sun, Dachuan 21 November 2013 (has links)
Graphics Processing Unit (GPU) programming techniques have been applied to a range of scientific and engineering computations. In computational electromagnetics, uses of the GPU technique have dramatically increased since the release of NVIDIA’s Compute Unified Device Architecture (CUDA), a powerful and simple-to-use programmer environment that renders GPU computing easy accessibility to developers not specialized in computer graphics. The focus of recent research has been on problems concerning the Finite-Difference Time-Domain (FDTD) simulation of electromagnetic (EM) fields. Traditional FDTD methods sometimes run slowly due to large memory and CPU requirements for modeling electrically large structures. Acceleration methods such as parallel programming are then needed. FDTD algorithm is suitable for multi-thread parallel computation with GPU. For complex structures and procedures, high performance GPU calculation algorithms will be crucial. In this work, we present the implementation of GPU programming for acceleration of computations for EM engineering problems. The speed-up is demonstrated through a few simulations with inexpensive GPUs and ACEnet, and the attainable efficiency is illustrated with numerical results. Using C, CUDA C, Matlab GPU, and ACEnet, we make comparisons between serial and parallel algorithms and among computations with and without GPU and CUDA, different types of GPUs, and personal computers and ACEnet. A maximum of 26.77 times of speed-up is achieved, which could be further boosted with development of new hardware in the future. The acceleration in run time will make many investigations possible and will pave the way for studies of large-scale computational electromagnetic problems that were previously impractical. This is a field that definitely invites more in-depth studies. / This is the thesis of my Master of Applied Science work at Dalhousie University.
7

Design and Implementation of C Programming Language Extension for Parallel GPU Computing

Yang, Yu-Wei 27 July 2010 (has links)
NVIDIA developed a technique of executing general program on GPU, named CUDA (Compute Unified Device Architecture), in 2006. The CUDA programming model allows a group of same instructions to execute on multi-thread simultaneously, which has advantage of parallel programs in reducing the execution time significantly. Although CUDA provides a series of C-like APIs (Application Programming Interface) so that programmers can easy use CUDA language, it still costs certain efforts to be familiar with the development. In this thesis, we propose a tool to automatically translate C programs into corresponding CUDA programs which reduce program development time effectively.
8

Towards Algorithm Transformation for Temporal Data Mining on GPU

Ponce, Sean Philip 18 August 2009 (has links)
Data Mining allows one to analyze large amounts of data. With increasing amounts of data being collected, more computing power is needed to mine these larger and larger sums of data. The GPU is an excellent piece of hardware with a compelling price to performance ratio and has rapidly risen in popularity. However, this increase in speed comes at a cost. The GPU's architecture executes non-data parallel code with either marginal speedup or even slowdown. The type of data mining we examine, temporal data mining, uses a ¯nite state machine (FSM), which is non-data parallel. We contribute the concept of algorithm transformation for increasing the data parallelism of an algorithm. We apply the algorithm transformation process to the problem of temporal data mining which solves the same problem as the FSM-based algorithm, but is data parallel. The new GPU implementation shows a 6x speedup over the best CPU implementation and 11x speedup over a previous GPU implementation. / Master of Science
9

ANALYZING GENERAL-PURPOSE COMPUTING PERFORMANCE ON GPU

Meng, Fanfu 01 December 2015 (has links)
ABSTRACT Analyzing General-Purpose Computing Performance on GPU Graphic Processing Unit (GPU) has become one of the most important components in modern computer systems. GPUs have evolved from a single -purpose graphic rendering hardware to a powerful processor that is capable of handling many different kinds of computing tasks. However, GPUs don’t perform well on every application, and it takes a lot of design effort to get good performance on a GPU. This thesis aims to investigate the relative performance of a GPU vs. CPU. Design effort is held minimum for both CPU implementations and GPU implementations. Matrix multiplication, Advance Encryption Standard (AES) and 32-bit Cyclic Redundancy Check (CRC32) are implemented on both a CPU and GPU. Input data size is varied to test the performance of the CPU and the GPU. The GPU generally has better performance than the CPU for matrix multiplication and AES because of the applications' good instruction and data parallelism. CRC has very poor parallelism, so the CPU performs better. For very small data inputs, the CPU generally outperformed the GPU because of GPU memory transfer overhead.
10

SAT Solver akcelerovaný pomocí GPU / GPU Accelerated SAT Solver

Izrael, Petr January 2013 (has links)
This thesis is concerned with design and implementation of a complete SAT solver accelerated on GPU. The achitecture of modern graphics cards is described as well as the CUDA platform and a list of common algorithms used for solving the boolean satisfiability problem (the SAT problem). The presented solution is based on the 3-SAT DC alogirthm, which belongs to the family of well-known DPLL based algorithms. This work describes problems encountered during the design and implementation. The resulting application was then analyzed and optimized. The presented solver cannot compete with state of the art solvers, but proves that it can be up to 21x faster than an equivalent sequential version. Unfortunately, the current implementation can only handle formulas of a limited size. Suggestions on further improvements are given in final sections.

Page generated in 0.0662 seconds