1 |
A compiler for parallel execution of numerical Python programs on graphics processing unitsGarg, Rahul 11 1900 (has links)
Modern Graphics Processing Units (GPUs) are providing breakthrough performance for numerical computing at the cost of increased programming complexity. Current programming models for GPUs require that the programmer manually manage the data transfer between CPU and GPU. This thesis proposes a simpler programming model and introduces a new compilation framework to enable Python applications containing numerical computations to be executed on GPUs and multi-core CPUs.
The new programming model minimally extends Python to include type and parallel-loop annotations. Our compiler framework then automatically identifies the data to be transferred between the main memory and the GPU for a particular class of affine array accesses. The compiler also automatically performs loop transformations to improve performance on GPUs.
For kernels with regular loop structure and simple memory access patterns, the GPU code generated by the compiler achieves significant performance improvement over multi-core CPU codes.
|
2 |
A compiler for parallel execution of numerical Python programs on graphics processing unitsGarg, Rahul Unknown Date
No description available.
|
3 |
Parallelizing Simulated Annealing Placement for GPGPUChoong, Alexander 17 December 2010 (has links)
Field Programmable Gate Array (FPGA) devices are increasing in capacity at an exponential rate, and thus there is an increasingly strong demand to accelerate simulated annealing placement. Graphics Processing Units (GPUs) offer a unique opportunity to accelerate this simulated annealing placement on a manycore architecture using only commodity hardware. GPUs are optimized for applications which can tolerate single-thread latency and so GPUs can provide high throughput across many threads. However simulated annealing is not embarrassingly parallel and so single thread latency should be minimized to improve run time. Thus it is questionable whether GPUs can achieve any speedup over a sequential implementation. In this thesis, a novel subset-based simulated annealing placement framework is proposed, which specifically targets the GPU architecture. A highly optimized framework is implemented which, on average, achieves an order of magnitude speedup with less than 1% degradation for wirelength and no loss in quality for timing on realistic architectures.
|
4 |
Parallelizing Simulated Annealing Placement for GPGPUChoong, Alexander 17 December 2010 (has links)
Field Programmable Gate Array (FPGA) devices are increasing in capacity at an exponential rate, and thus there is an increasingly strong demand to accelerate simulated annealing placement. Graphics Processing Units (GPUs) offer a unique opportunity to accelerate this simulated annealing placement on a manycore architecture using only commodity hardware. GPUs are optimized for applications which can tolerate single-thread latency and so GPUs can provide high throughput across many threads. However simulated annealing is not embarrassingly parallel and so single thread latency should be minimized to improve run time. Thus it is questionable whether GPUs can achieve any speedup over a sequential implementation. In this thesis, a novel subset-based simulated annealing placement framework is proposed, which specifically targets the GPU architecture. A highly optimized framework is implemented which, on average, achieves an order of magnitude speedup with less than 1% degradation for wirelength and no loss in quality for timing on realistic architectures.
|
5 |
Path Integral Approaches and Graphics Processing Unit Tools for Quantum Molecular Dynamics SimulationsConstable, Stephen Joel January 2012 (has links)
This thesis details both the technical and theoretical aspects of performing path integrals through classical Molecular Dynamics (MD) simulations. In particular, Graphics Processing Unit (GPU) computing is used to augment the Path Integral Molecular Dynamics (PIMD) portion of the widely available Molecular Modelling Tool Kit (MMTK) library. This same PIMD code is also extended in a different direction: a novel method for nuclear ground state property prediction is introduced that closely mimics existing code in functional form.
In order to add GPU computing capabilities to the existing MMTK codebase, the open source Open Molecular Mechanics (OpenMM) library was used. OpenMM provides high performance implementations of a variety of commonly used MD algorithms, with the goal of supporting current and future specialized hardware. Due to the object oriented nature of both codes, and the use of SI units in each, the development process was rather painless. The integration of OpenMM with MMTK is seamless, and arbitrary systems are supported without the user even needing to know that GPU acceleration is being used. The hybrid OpenMM-MMTK code is benchmarked against the vanilla MMTK code in terms of speed and accuracy, and the results show that GPU computing is the obvious choice for PIMD simulations.
Starting with a desire to apply the highly efficient Path Integral Langevin Equation (PILE) thermostat to the Path Integral Ground State (PIGS) problem, a new hybrid PILE-PIGS, or LE-PIGS, method was developed. This thesis describes the theoretical justification for this method, including the introduction of a modified normal mode representation based on the Discrete Cosine Transform (DCT). It is shown that in DCT space, the equations of motion of a PIGS system are virtually identical to the equations of motion of a PIMD system in Fourier space. This leads to direct reuse of existing PILE code in MMTK, and options to extend this ground state problem to OpenMM for the purpose of GPU acceleration. The method is applied to a series of model systems, and in each case convergence to the exact ground state energy is observed.
|
6 |
The Application of GPGPU in Network Packet ProcessingSu, Chun-cheng 26 July 2010 (has links)
Several demands relied on high-performance computing come up with the advanced technologies, like Satellite Imaging, Genetic Engineering, Global Weather Forecast, Nuclear Explosion Emulation, and in the meantime, the amount of data usually approaches the rank of Tera-Bytes, even Peta-Bytes. Besides, we need practical image application in our daily life, such as Game, 3D Display, High-Definition Video, etc. These requirements of high-performance computing are rigorous challenge to current devices.
The performance of GPU (Graphic Processing Unit) is growing up rapidly in recent years. GPU doubles its computing power every year, which is far superior to CPU (Central Processing Unit) performance based on Moore¡¦s Law. Nowadays, the computing power of GPU on the single-precision floating-point operations is ten times than that of CPU. Furthermore, CUDA (Compute Unified Device Architecture) is a parallel computing architecture proposed by NVIDIA at 2007, and it is the first C-like language software development environment without Graphics API.
In this research, we use GPU to assist network devices in filtering packets of the network flow, whose quantity is becoming more and more large. Due to the popularization of network, people pay attention to different types of network attacks or safety problems. Therefore, it is important to remove malicious packets from normal ones without degrading the network performance.
|
7 |
Kaijsers algoritm för beräkning av Kantorovichavstånd parallelliserad i CUDAEngvall, Sebastian January 2013 (has links)
This thesis processes the work of developing CPU code and GPU code for Thomas Kaijsers algorithm for calculating the kantorovich distance and the performance between the two is compared. Initially there is a rundown of the algorithm which calculates the kantorovich distance between two images. Thereafter we go through the CPU implementation followed by GPGPU written in CUDA. Then the results are presented. Lastly, an analysis about the results and a discussion with possible improvements is presented for possible future applications.
|
8 |
The Low-Frequency Multi-Level Fast Multipole Method on Graphics ProcessorsCwikla, Martin 14 September 2009 (has links)
The Fast Multipole Method (FMM) allows for rapid evaluation of the fundamental solution of the Helmholtz equation, known as Green's function. Evaluation times are reduced from O(N^2), using the direct approach, down to O(N log N), with an accuracy specified by the user. The Helmholtz equation, and variations thereof, including the Laplace and wave equations, are used to describe physical phenomena in electromagnetics, acoustics, heat dissipation, and many other applications. This thesis studies the acceleration of the low-frequency FMM, where the product of the wave number and the translation distance of expansion coefficients is relatively low. A general-purpose graphics processing unit (GPGPU), with native support of double-precision arithmetic, was used in the implementation of the LF FMM, with a resulting speedup of 4-22X over a conventional central processing unit (CPU), running in a single-threaded manner, for various simulations involving hundreds of thousands to millions of sources.
|
9 |
The Low-Frequency Multi-Level Fast Multipole Method on Graphics ProcessorsCwikla, Martin 14 September 2009 (has links)
The Fast Multipole Method (FMM) allows for rapid evaluation of the fundamental solution of the Helmholtz equation, known as Green's function. Evaluation times are reduced from O(N^2), using the direct approach, down to O(N log N), with an accuracy specified by the user. The Helmholtz equation, and variations thereof, including the Laplace and wave equations, are used to describe physical phenomena in electromagnetics, acoustics, heat dissipation, and many other applications. This thesis studies the acceleration of the low-frequency FMM, where the product of the wave number and the translation distance of expansion coefficients is relatively low. A general-purpose graphics processing unit (GPGPU), with native support of double-precision arithmetic, was used in the implementation of the LF FMM, with a resulting speedup of 4-22X over a conventional central processing unit (CPU), running in a single-threaded manner, for various simulations involving hundreds of thousands to millions of sources.
|
10 |
Path Integral Approaches and Graphics Processing Unit Tools for Quantum Molecular Dynamics SimulationsConstable, Stephen Joel January 2012 (has links)
This thesis details both the technical and theoretical aspects of performing path integrals through classical Molecular Dynamics (MD) simulations. In particular, Graphics Processing Unit (GPU) computing is used to augment the Path Integral Molecular Dynamics (PIMD) portion of the widely available Molecular Modelling Tool Kit (MMTK) library. This same PIMD code is also extended in a different direction: a novel method for nuclear ground state property prediction is introduced that closely mimics existing code in functional form.
In order to add GPU computing capabilities to the existing MMTK codebase, the open source Open Molecular Mechanics (OpenMM) library was used. OpenMM provides high performance implementations of a variety of commonly used MD algorithms, with the goal of supporting current and future specialized hardware. Due to the object oriented nature of both codes, and the use of SI units in each, the development process was rather painless. The integration of OpenMM with MMTK is seamless, and arbitrary systems are supported without the user even needing to know that GPU acceleration is being used. The hybrid OpenMM-MMTK code is benchmarked against the vanilla MMTK code in terms of speed and accuracy, and the results show that GPU computing is the obvious choice for PIMD simulations.
Starting with a desire to apply the highly efficient Path Integral Langevin Equation (PILE) thermostat to the Path Integral Ground State (PIGS) problem, a new hybrid PILE-PIGS, or LE-PIGS, method was developed. This thesis describes the theoretical justification for this method, including the introduction of a modified normal mode representation based on the Discrete Cosine Transform (DCT). It is shown that in DCT space, the equations of motion of a PIGS system are virtually identical to the equations of motion of a PIMD system in Fourier space. This leads to direct reuse of existing PILE code in MMTK, and options to extend this ground state problem to OpenMM for the purpose of GPU acceleration. The method is applied to a series of model systems, and in each case convergence to the exact ground state energy is observed.
|
Page generated in 0.0151 seconds