Spelling suggestions: "subject:"ppu acceleration"" "subject:"ppu cceleration""
1 |
Accelerating SRD Simulation on GPUChen, Zhilu 17 April 2013 (has links)
Stochastic Rotation Dynamics (SRD) is a particle-based simulation method that can be used to model complex fluids either in two or three dimensions, which is very useful in biology and physics study. Although SRD is computationally efficient compared to other simulations, it still takes a long time to run the simulation when the size of the model is large, e.g. when using a large array of particles to simulate dense polymers. In some cases, the simulation could take months before getting the results. Thus, this research focuses on the acceleration of the SRD simulation by using GPU. GPU acceleration can reduce the simulation time by orders of magnitude. It is also cost-effective because a GPU costs significantly less than a computer cluster. Compute Unified Device Architecture (CUDA) programming makes it possible to parallelize the program to run on hundreds or thousands of thread processors on GPU. The program is divided into many concurrent threads. In addition, several kernel functions are used for data synchronization. The speedup of GPU acceleration is varied for different parameters of the simulation program, such as size of the model, density of the particles, formation of polymers, and above all the complexity of the algorithm itself. Compared to the CPU version, it is about 10 times speedup for the particle simulation and up to 50 times speedup for polymers. Further performance improvement can be achieved by using multiple GPUs and code optimization.
|
2 |
Enhancing the capabilities of computational chemistry using GPU technologyNeedham, Perri January 2013 (has links)
Three key enhancements were made to a semiempirical molecular orbital program to develop a fast, accurate method of calculating chemical properties of large (> 1000 atom) molecular systems, through the use of quantum theory. In this thesis the key enhancements are presented which are: the implementation of a divide-and-conquer approach to a self-consistent field procedure, in an effort to improve capability; the use of the novel technology, GPU technology, to parallelize the divide-and-conquer self-consistent field procedure, in an effort to improve the speed; the implementation of a newly developed semiempirical model, the Polarized Molecular Orbital Model, in an effort to improve the accuracy. The development of a divide-and-conquer approach to the SCF (DC-SCF) procedure (enhancement 1) was carried out using saturated hydrocarbon chains whereby the saturated hydrocarbon chain is partitioned into small overlapping subsystems and the Roothaan equations solved for each subsystem. An investigation was carried out to find the optimal partitioning scheme for saturated hydrocarbon chains in order to minimize the loss of energy experienced from neglecting some of the interactions in the system whilst maintaining near linear scaling with system size. The DC-SCF procedure was shown to be accurate to 10-3 kcal mol-1 per atom whilst calculating the SCF-energy nearly 6 times faster than using the standard SCF procedure, for a 698-atom system. The development of a parallel DC-SCF procedure and Cartesian forces calculation for use on a GPU (enhancement 2), resulted in a hybrid CPU/GPU DC-SCF implementation that calculated the energy of a 1997-atom saturated hydrocarbon chain 21 times faster than the standard serial SCF implementation and a accelerated Cartesian forces calculation that performed 7 times faster for a saturated hydrocarbon chain of 1205-atoms, when accelerated using an NVidia Tesla C2050 GPU. The hybrid CPU/GPU algorithm made use of commercially accelerated linear algebra libraries, CULA and CUBLAS. A comparison was made between CULA’s accelerated eigensolver routine and the accelerated DC-eigensolver (developed in this research) and it was found that for saturated hydrocarbon chains of > 350 atoms, the accelerated DC-eigensolver performed around twice as fast as the accelerated CULA eigensolver. The implementation of the Polarized Molecular Orbital model (enhancement 3) was validated against published isomerization energies and benchmarked against the non-nitrogen containing complexes in the S66 database. The benchmark complexes were categorized according to dominant intermolecular interactions namely, hydrogen bonding, dispersion interactions and mixed interactions. After assessment it was found that the PMO model predicts interaction energies of complexes with a mixture of dispersive and electrostatic interactions to the highest accuracy (0.69 kcal mol-1 with respect to CCSD(T)/CBS). The dispersion correction within the PMO model was found to ‘overcorrect’ the dispersive contribution for most complexes tested. The outcome of this research is a semiempirical molecular orbital program that calculates the energy of a closed-shell saturated hydrocarbon chain of ~2000 atoms in under 4 minutes instead of 1.5 hours when using a PM3-Hamiltonian and can calculate interaction energies of systems exhibiting a mixture of electrostatic and dispersive interactions to an accuracy of within 1 kcal mol-1 (relative to high-level quantum methods). To demonstrate a suitable application for the enhanced SE-MO program, interaction energies of a series of PAHs with water, phenol and methanol have been investigated. The resultant program is suitable for use in calculating the energy and forces of large material and (in future) biological systems by a fast and accurate method that would be impractical or impossible to do without these enhancements.
|
3 |
GPU Accelerated Framework for Cryogenic Electron Tomography using Proximal AlgorithmsRey Ramirez, Julio A. 04 1900 (has links)
Cryogenic electron tomography provides visualization of cellular complexes in situ, allowing a further understanding of cellular function. However, the projection images from this technique present a meager signal-to-noise ratio due to the limited electron dose, and the lack of projections at high tilt angles produces the 'missing-wedge' problem in the Fourier domain. These limitations in the projection data prevent traditional reconstruction techniques from achieving good reconstructions. Multiple strategies have been proposed to deal with the noise and the artifacts arising from the 'missing-wedge’ problem. For example, manually selecting subtomograms of identical structures and averaging them (subtogram averaging), data-driven approaches that intend to perform subtogram averaging automatically, and various methods for denoising tilt-series before reconstruction or denoising the volumes after reconstruction. Most of these approaches are additional pre-processing or post-processing steps independent from the reconstruction method, and the consistency of the resulting tomograms with the original projection data is lost after the modifications. We propose a GPU accelerated optimization-based reconstruction framework using proximal algorithms. Our framework integrates denoising in the reconstruction process by alternating between reconstruction and denoising, relieving the users of the need to select additional denoising algorithms and preserving the consistency between final tomograms and projection data. Thanks to the flexibility provided by proximal algorithms, various available proximal operators can be interchanged for each task, e.g., various algebraic reconstruction methods and denoising techniques. We evaluate our approach qualitatively by comparison with current reconstruction and denoising approaches, showing excellent denoising capabilities and superior visual quality of the reconstructed tomograms. We quantitatively evaluate the methods with a recently proposed synthetic dataset for scanning transmission electron microscopy, achieving superior reconstruction quality for a noisy and angle-limited synthetic dataset.
|
4 |
Improving the performance of GPU-accelerated spatial joinsHrstic, Dusan Viktor January 2017 (has links)
Data collisions have been widely studied by various fields of science and industry. Combing CPU and GPU for processing spatial joins has been broadly accepted due to the increased speed of computations. This should redirect efforts in GPGPU research from straightforward porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. As threads are executing instructions while using hardware resources that are available, impact of different thread organizations and their effect on spatial join performance is analyzed and examined in this report.Having new perspectives and solutions to the problem of thread organization and warp scheduling may contribute more to encourage others to program on the GPU side. The aim with this project is to examine the impact of different thread organizations in spatial join processes. The relationship between the items inside datasets are examined by counting the number of collisions their join produce in order to understand how different approaches may have an influence on performance. Performance benchmarking, analysis and measuring of different approaches in thread organization are investigated and analyzed in this report in order to find the most time efficient solution which is the purpose of the conducted work.This report shows the obtained results for the utilization of different thread techniques in order to optimize the computational speeds of the spatial join algorithms. There are two algorithms on the GPU, one implementing thread techniques and the other non-optimizing solution. The GPU times are compared with the execution times on the CPU and the GPU implementations are verified by observing the collision counters that are matching with all of the collision counters from the CPU counterpart.In the analysis part of this report the the implementations are discussed and compared to each other. It has shown that the difference between algorithm implementing thread techniques and the non-optimizing one lies around 80% in favour of the algorithm implementing thread techniques and it is also around 56 times faster then the spatial joins on the CPU. / Datakollisioner har studerats i stor utsträckning i olika områden inom vetenskap och industri. Att kombinera CPU och GPU för bearbetning av rumsliga föreningar har godtagits på grund av bättre prestanda. Detta bör omdirigera insatser i GPGPU-forskning från en enkel portning av applikationer till fastställande av principer och strategier som möjliggör en effektiv användning av grafikhårdvara. Eftersom trådar som exekverar instruktioner använder sig av hårdvaruresurser, förekommer olika effekter beroende på olika trådorganisationer. Deras på verkan på prestanda av rumsliga föreningar kommer att analyseras och granskas i denna rapport. Nya perspektiv och lösningar på problemet med trådorganisationen och schemaläggning av warps kan bidra till att fler uppmuntras till att använda GPU-programmering. Syftet med denna rapport är att undersöka effekterna av olika trådorganisationer i rumsliga föreningar. Förhållandet mellan objekten inom datamängder undersöks genom att beräkna antalet kollisioner som ihopslagna datamängder förorsakar. Detta görs för att förstå hur olika metoder kan påverka effektivitet och prestanda. Prestandamätningar av olika metoder inom trå dorganisationer undersö ks och analyseras fö r att hitta den mest tidseffektiva lösningen. I denna rapport visualiseras också det erhållna resultatet av olika trådtekniker som används för att optimera beräkningshastigheterna för rumsliga föreningar. Rapporten undersökeren CPU-algoritm och två GPU-algoritmer. GPU tiderna jämförs hela tiden med exekveringstiderna på CPU:n, och GPU-implementeringarna verifieras genom att jämföra antalet kollisioner från både CPU:n och GPU:n. Under analysdelen av rapporten jämförs och diskuteras olika implementationer med varandra. Det visade sig att skillnaden mellan en algoritm som implementerar trådtekniker och en icke-optimerad version är cirka 80 % till förmån för algoritmen som implementerar trådtekniker. Det visade sig också föreningarna på CPU:n att den är runt 56 gånger snabbare än de rumsliga
|
5 |
Advanced optimization and sampling techniques for biomolecules using a polarizable force fieldLitman, Jacob Mordechai 01 May 2019 (has links)
Biophysical simulation can be an excellent complement to experimental techniques, but there are unresolved practical constraints to simulation. While computers have continued to improve, the scale of systems we wish to study has continued to increase. This has driven the use of approximate energy functions (force fields), compensating for relatively short simulations via careful structure preparation and accelerated sampling techniques. To address structure preparation, we developed the many-body dead end elimination (MB-DEE) optimizer. We first proved the MB-DEE algorithm on a set of PCNA crystal structures, and accelerated it on GPUs to optimize 472 homology models of proteins implicated in inherited deafness. Advanced physics has been clearly demonstrated to help optimize structures, and with GPU acceleration, this becomes a possibility for large numbers of structures. We also show the novel “simultaneous bookending” algorithm, which is a new approach to indirect free energy (IFE) methods. These first perform simulations under a cheaper “reference” potential, then correct the thermodynamics to a more sophisticated “target” potential, combining the speed of the reference potential with the accuracy of the target potential. Simultaneous bookending is shown as a valid IFE approach, and methods to realize speedups vs. the direct path are discussed. Finally, we are developing the Monte Carlo Orthogonal Space Random Walk (MC-OSRW) algorithm for high-performance alchemical free energy simulations, bypassing some of the difficulty in OSRW methods. This work helps prevent inaccuracies caused by simpler electrostatic models by making advanced polarizable force fields more accessible for routine simulation.
|
6 |
Performance Metrics Analysis of GamingAnywhere with GPU accelerated NVIDIA CUDASreenibha Reddy, Byreddy January 2018 (has links)
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
|
7 |
Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDASreenibha Reddy, Byreddy January 2018 (has links)
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
|
8 |
Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuSZaahid, Mohammed January 2018 (has links)
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
|
9 |
A Unified Approach to GPU-Accelerated Aerial Video Enhancement TechniquesCluff, Stephen Thayn 12 February 2009 (has links) (PDF)
Video from aerial surveillance can provide a rich source of data for analysts. From the time-critical perspective of wilderness search and rescue operations, information extracted from aerial videos can mean the difference between a successful search and an unsuccessful search. When using low-cost, payload-limited mini-UAVs, as opposed to more expensive platforms, several challenges arise, including jittery video, narrow fields of view, low resolution, and limited time on screen for key features. These challenges make it difficult for analysts to extract key information in a timely manner. Traditional approaches may address some of these issues, but no existing system effectively addresses all of them in a unified and efficient manner. Building upon a hierarchical dense image correspondence technique, we create a unifying framework for reducing jitter, enhancing resolution, and expanding the field of view while lengthening the time that features remain on screen. It also provides for easy extraction of moving objects in the scene. Our method incorporates locally adaptive warps which allows for robust image alignment even in the presence of parallax and without the aid of internal or external camera parameters. We accelerate the image registration process using commodity Graphics Processing Units (GPUs) to accomplish all of these tasks in near real-time with no external telemetry data.
|
10 |
GPU-Accelerated Point-Based Color BleedingSchmitt, Ryan Daniel 01 June 2012 (has links) (PDF)
Traditional global illumination lighting techniques like Radiosity and Monte Carlo sampling are computationally expensive. This has prompted the development of the Point-Based Color Bleeding (PBCB) algorithm by Pixar in order to approximate complex indirect illumination while meeting the demands of movie production; namely, reduced memory usage, surface shading independent run time, and faster renders than the aforementioned lighting techniques.
The PBCB algorithm works by discretizing a scene’s directly illuminated geometry into a point cloud (surfel) representation. When computing the indirect illumination at a point, the surfels are rasterized onto cube faces surrounding that point, and the constituent pixels are combined into the final, approximate, indirect lighting value.
In this thesis we present a performance enhancement to the Point-Based Color Bleeding algorithm through hardware acceleration; our contribution incorporates GPU-accelerated rasterization into the cube-face raster phase. The goal is to leverage the powerful rasterization capabilities of modern graphics processors in order to speed up the PBCB algorithm over standard software rasterization. Additionally, we contribute a preprocess that generates triangular surfels that are suited for fast rasterization by the GPU, and show that new heterogeneous architecture chips (e.g. Sandy Bridge from Intel) simplify the code required to leverage the power of the GPU. Our algorithm reproduces the output of the traditional Monte Carlo technique with a speedup of 41.65x, and additionally achieves a 3.12x speedup over software-rasterized PBCB.
|
Page generated in 0.1286 seconds