• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 166
  • 65
  • 52
  • 12
  • 10
  • 9
  • 7
  • 6
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 400
  • 204
  • 118
  • 107
  • 80
  • 73
  • 71
  • 54
  • 42
  • 41
  • 38
  • 36
  • 36
  • 32
  • 32
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

GPU parallelization of the Mishchenko method for solving Fredholm equations of the first kind

Nordström, Johan January 2015 (has links)
Fredholm integral equations of the first kind are known to be ill-posed and may be impossible to solve analytically. A. S. Mishchenko et al. have developed a method to generate numerical solutions to Fredholm equations which occurs in physics. Mischenko's method is a Monte Carlo method which can run in parallel. The purpose of this project was to investigate how a parallel version of the Mishchenko method can be implemented on a Graphics Processing Unit (GPU). The developed program uses the CUDA platform for GPU programming. The conclusion of the project is that it is definitely possible to implement the Mishchenko method on a GPU. However, some properties of the algorithm are not optimal for the GPU. A more thorough analysis of the implementation is needed to get a complete understanding of the performance and the bottlenecks.
2

GPU based IP forwarding

Blomquist, Linus, Engström, Hampus January 2015 (has links)
This thesis was about investigating if it is feasible to implement an IP-forwarding data plane on a GPU. A GPU is energy efficient compared to other more powerful processors on the market today and should in theory be efficient to use for routing purposes. An IP-forwarding data plane consist of several things where we focused on some of the concepts. We have implemented IP-forwarding lookup operations, packet header changes, prioritization between different packets and a traffic shaper to restrict the packet throughput. To test these concepts we implemented a prototype, on a Tegra platform, in CUDA and evaluated its performance. We are able to forward 28 Mpackets/second with a best case latency of 27 µS given local simulated packets. The conclusions we can draw of this thesis work is that using a GPU for IP-forwarding purposes seems like an energy efficient solution compared to other routers on the market today. In the thesis we also tried the concept of only launching the GPU kernel once and let it be infinite which shows promising results for future work.
3

Kaijsers algoritm för beräkning av Kantorovichavstånd parallelliserad i CUDA

Engvall, Sebastian January 2013 (has links)
This thesis processes the work of developing CPU code and GPU code for Thomas Kaijsers algorithm for calculating the kantorovich distance and the performance between the two is compared. Initially there is a rundown of the algorithm which calculates the kantorovich distance between two images. Thereafter we go through the CPU implementation followed by GPGPU written in CUDA. Then the results are presented. Lastly, an analysis about the results and a discussion with possible improvements is presented for possible future applications.
4

Designing RDMA-based efficient Communication for GPU Remoting

Bhandare, Shreya Amit 24 August 2023 (has links)
The use of General Purpose Graphics Processing Units (GPGPUs) has become crucial for accelerating high-performance applications. However, the procurement, setup, and maintenance of GPUs can be costly, and their continuous energy consumption poses additional challenges. Moreover, many applications exhibit suboptimal GPU utilization. To address these concerns, GPU virtualization techniques have been proposed. Among them, GPU Remoting stands out as a promising technology that enables applications to transparently harness the computational capabilities of GPUs remotely. GVirtuS, a GPU Remoting software, facilitates transparent and hypervisor-independent access to GPGPUs within virtual machines. This research focuses on the middleware communication layer implemented in GVirtuS and presents a comprehensive redesign that leverages the power of Remote Direct Memory Access (RDMA) technology. Experimental evaluations, conducted using a matrix multiplication application, demonstrate that the newly proposed protocol achieves approximately 50% reduced execution time for data sizes ranging from 1 to 16MB, and around 12% decreased execution time for sizes ranging from 500 to upto 1GB. These findings highlight the significant performance improvements attained through the redesign of the communication layer in GVirtuS, showcasing its potential for enhancing GPU Remoting efficiency. / Master of Science / General Purpose Graphics Processing Units (GPGPUs) have become essential tools for accelerating high-performance applications. However, the acquisition and maintenance of GPUs can be expensive, and their continuous energy consumption adds to the overall costs. Additionally, many applications often underutilize the full potential of GPUs. To tackle these challenges, researchers have proposed GPU virtualization techniques. One such promising approach is GPU Remoting, which enables applications to seamlessly utilize GPUs remotely. GVirtuS, a GPU Remoting software, allows virtual machines to access GPGPUs in a transparent and independent manner from the underlying system. This study focuses on enhancing the communication layer in GVirtuS, which facilitates efficient interaction between virtual machines and GPUs. By leveraging advanced technology called Remote Direct Memory Access (RDMA), we achieved significant improvements in performance. Evaluations using a matrix multiplication application showed a reduction of approximately 50% in execution time for small data sizes (1-16MB) and around 12% for larger sizes (500-800MB). These findings highlight the potential of our redesign to enhance GPU virtualization, leading to better performance and cost-efficiency in various applications.
5

Um estudo do uso eficiente de programas em placas gráficas / A case study on the efficient use of programs on GPUs

Ikeda, Patricia Akemi 20 September 2011 (has links)
Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado. / Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
6

Um estudo do uso eficiente de programas em placas gráficas / A case study on the efficient use of programs on GPUs

Patricia Akemi Ikeda 20 September 2011 (has links)
Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado. / Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
7

GPU-Based Acceleration on ACEnet for FDTD Method of Electromagnetic Field Analysis

Sun, Dachuan 21 November 2013 (has links)
Graphics Processing Unit (GPU) programming techniques have been applied to a range of scientific and engineering computations. In computational electromagnetics, uses of the GPU technique have dramatically increased since the release of NVIDIA’s Compute Unified Device Architecture (CUDA), a powerful and simple-to-use programmer environment that renders GPU computing easy accessibility to developers not specialized in computer graphics. The focus of recent research has been on problems concerning the Finite-Difference Time-Domain (FDTD) simulation of electromagnetic (EM) fields. Traditional FDTD methods sometimes run slowly due to large memory and CPU requirements for modeling electrically large structures. Acceleration methods such as parallel programming are then needed. FDTD algorithm is suitable for multi-thread parallel computation with GPU. For complex structures and procedures, high performance GPU calculation algorithms will be crucial. In this work, we present the implementation of GPU programming for acceleration of computations for EM engineering problems. The speed-up is demonstrated through a few simulations with inexpensive GPUs and ACEnet, and the attainable efficiency is illustrated with numerical results. Using C, CUDA C, Matlab GPU, and ACEnet, we make comparisons between serial and parallel algorithms and among computations with and without GPU and CUDA, different types of GPUs, and personal computers and ACEnet. A maximum of 26.77 times of speed-up is achieved, which could be further boosted with development of new hardware in the future. The acceleration in run time will make many investigations possible and will pave the way for studies of large-scale computational electromagnetic problems that were previously impractical. This is a field that definitely invites more in-depth studies. / This is the thesis of my Master of Applied Science work at Dalhousie University.
8

Design and Implementation of C Programming Language Extension for Parallel GPU Computing

Yang, Yu-Wei 27 July 2010 (has links)
NVIDIA developed a technique of executing general program on GPU, named CUDA (Compute Unified Device Architecture), in 2006. The CUDA programming model allows a group of same instructions to execute on multi-thread simultaneously, which has advantage of parallel programs in reducing the execution time significantly. Although CUDA provides a series of C-like APIs (Application Programming Interface) so that programmers can easy use CUDA language, it still costs certain efforts to be familiar with the development. In this thesis, we propose a tool to automatically translate C programs into corresponding CUDA programs which reduce program development time effectively.
9

Evaluation of Computer Vision Algorithms Optimized for Embedded GPU:s. / Utvärdering av bildbehandlingsalgoritmer optimerade för inbyggda GPU:er

Nilsson, Mattias January 2014 (has links)
The interest of using GPU:s as general processing units for heavy computations (GPGPU) has increased in the last couple of years. Manufacturers such as Nvidia and AMD make GPU:s powerful enough to outrun CPU:s in one order of magnitude, for suitable algorithms. For embedded systems, GPU:s are not as popular yet. The embedded GPU:s available on the market have often not been able to justify hardware changes from the current systems (CPU:s and FPGA:s) to systems using embedded GPU:s. They have been too hard to get, too energy consuming and not suitable for some algorithms. At SICK IVP, advanced computer vision algorithms run on FPGA:s. This master thesis optimizes two such algorithms for embedded GPU:s and evaluates the result. It also evaluates the status of the embedded GPU:s on the market today. The results indicates that embedded GPU:s perform well enough to run the evaluatedd algorithms as fast as needed. The implementations are also easy to understand compared to implementations for FPGA:s which are competing hardware.
10

SAT Solver akcelerovaný pomocí GPU / GPU Accelerated SAT Solver

Izrael, Petr January 2013 (has links)
This thesis is concerned with design and implementation of a complete SAT solver accelerated on GPU. The achitecture of modern graphics cards is described as well as the CUDA platform and a list of common algorithms used for solving the boolean satisfiability problem (the SAT problem). The presented solution is based on the 3-SAT DC alogirthm, which belongs to the family of well-known DPLL based algorithms. This work describes problems encountered during the design and implementation. The resulting application was then analyzed and optimized. The presented solver cannot compete with state of the art solvers, but proves that it can be up to 21x faster than an equivalent sequential version. Unfortunately, the current implementation can only handle formulas of a limited size. Suggestions on further improvements are given in final sections.

Page generated in 0.0269 seconds