Spelling suggestions: "subject:"gpu""
51 |
Ray-traced radiative transfer on massively threaded architecturesThomson, Samuel Paul January 2018 (has links)
In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the grace software library. This is combined with an extant radiative transfer solver to produce a new package, taranis. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler integration step, and is manifestly photon-conserving. To my knowledge, taranis is the first ray-traced radiative transfer code to run on graphics processing units and target cosmological-scale smooth particle hydrodynamics (SPH) datasets. A significant optimization effort is undertaken in developing grace. Contrary to typical results in computer graphics, it is found that the bounding volume hierarchies (BVHs) used to accelerate the ray tracing procedure need not be of high quality; as a result, extremely fast BVH construction times are possible (< 0.02 microseconds per particle in an SPH dataset). I show that this exceeds the performance researchers might expect from CPU codes by at least an order of magnitude, and compares favourably to a state-of-the-art ray tracing solution. Similar results are found for the ray-tracing itself, where again techniques from computer graphics are examined for effectiveness with SPH datasets, and new optimizations proposed. For high per-source ray counts (≳ 104), grace can reduce ray tracing run times by up to two orders of magnitude compared to extant CPU solutions developed within the astrophysics community, and by a factor of a few compared to a state-of-the-art solution. taranis is shown to produce expected results in a suite of de facto cosmological radiative transfer tests cases. For some cases, it currently out-performs a serial, CPU-based alternative by a factor of a few. Unfortunately, for the most realistic test its performance is extremely poor, making the current taranis code unsuitable for cosmological radiative transfer. The primary reason for this failing is found to be a small minority of particles which always dominate the timestep criteria. Several plausible routes to mitigate this problem, while retaining parallelism, are put forward.
|
52 |
GPU Implementation of the Particle Filter / GPU implementation av partikelfiltretGebart, Joakim January 2013 (has links)
This thesis work analyses the obstacles faced when adapting the particle filtering algorithm to run on massively parallel compute architectures. Graphics processing units are one example of massively parallel compute architectures which allow for the developer to distribute computational load over hundreds or thousands of processor cores. This thesis studies an implementation written for NVIDIA GeForce GPUs, yielding varying speed ups, up to 3000% in some cases, when compared to the equivalent algorithm performed on CPU. The particle filter, also known in the literature as sequential Monte-Carlo methods, is an algorithm used for signal processing when the system generating the signals has a highly nonlinear behaviour or non-Gaussian noise distributions where a Kalman filter and its extended variants are not effective. The particle filter was chosen as a good candidate for parallelisation because of its inherently parallel nature. There are, however, several steps of the classic formulation where computations are dependent on other computations in the same step which requires them to be run in sequence instead of in parallel. To avoid these difficulties alternative ways of computing the results must be used, such as parallel scan operations and scatter/gather methods. Another area where parallel programming still is not widespread is the area of pseudo-random number generation. Pseudo-random numbers are required by the algorithm to simulate the process noise as well as for avoiding the particle depletion problem using a resampling step. In this thesis a recently published counter-based pseudo-random number generator is used.
|
53 |
Real-Time Systems with Radiation-Hardened Processors : A GPU-based Framework to Explore TradeoffsAlhowaidi, Mohammad January 2012 (has links)
Radiation-hardened processors are designed to be resilient against soft errorsbut such processors are slower than Commercial Off-The-Shelf (COTS)processors as well significantly costlier. In order to mitigate the high costs,software techniques such as task re-executions must be deployed together withadequately hardened processors to provide reliability. This leads to a huge designspace comprising of the hardening level of the processors and the numberof re-executions of each task in the system. Each configuration in this designspace represents a tradeoff between processor load, reliability and costs. The reliability comes at the price of higher costs due to higher levels of hardeningand performance degradation due to hardening or due to re-executions.Thus, the tradeoffs between performance, reliability and costs must be carefullystudied. Pertinent questions that arise in such a design scenario are — (i)how many times a task must be re-executed and (ii) what should be hardeninglevel? — such that the system reliability is satisfied. In order to evaluate such tradeoffs efficiently, in this thesis, we proposenovel framework that harnesses the computational power of Graphics ProcessingUnits (GPUs). Our framework is based on a system failure probabilityanalysis that connects the probability of failure of tasks to the overall systemreliability. Based on characteristics of this probabilistic analysis as well asreal-time deadlines, we derive bounds on the design space to prune infeasiblesolutions. Finally, we illustrate the benefits of our proposed framework withseveral experiments
|
54 |
Optical Flow Computation on Compute Unified Device Architecture / Optiskt flödeberäkning med CUDARingaby, Erik January 2008 (has links)
<p>There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.</p><p>Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.</p>
|
55 |
An Application Developed for Simulation of Electrical Excitation and Conduction in a 3D Human HeartYu, Di 01 January 2013 (has links)
This thesis first reviews the history of General Purpose computing Graphic Processing Unit (GPGPU) and then introduces the fundamental problems that are suitable for GPGPU algorithm. The architecture of GPGPU is compared against modern CPU architecture, and the fundamental difference is outlined. The programming challenges faced by GPGPU and the techniques utilized to overcome these issues are evaluated and discussed.
The second part of the thesis presents an application developed with GPGPU technology to simulate the electrical excitation and conduction in a 3D human heart model based on cellular automata model. The algorithm and implementation are discussed in detail and the performance of GPU is compared against CPU.
|
56 |
A Case Study of Parallel Bilateral Filtering on the GPULarsson, Jonas January 2015 (has links)
Smoothing and noise reduction of images is often an important first step in image processing applications. Simple image smoothing algorithms like the Gaussian filter have the unfortunate side effect of blurring the image which could obfuscate important information and have a negative impact on the following applications. The bilateral filter is a well-used non-linear smoothing algorithm that seeks to preserve edges and contours while removing noise. The bilateral filter comes at a heavy cost in computational speed, especially when used on larger images, since the algorithm does a greater amount of work for each pixel in the image than some simpler smoothing algorithms. In applications where timing is important, this may be enough to encourage certain developers to choose a simpler filter, at the cost of quality. However, the time cost of the bilateral filter can be greatly reduced through parallelization, as the work for each pixel can theoretically be done simultaneously. This work uses Nvidia’s Compute Unified Device Architecture (CUDA) to implement and evaluate some of the most common and effective methods for parallelizing the bilateral filter on a Graphics processing unit (GPU). This includes use of the constant and shared memories, and a technique called 1 x N tiling. These techniques are evaluated on newer hardware and the results are compared to a sequential version, and a naive parallel version not using advanced techniques. This report also intends to give a detailed and comprehensible explanation to these techniques in the hopes that the reader may be able to use the information put forth to implement them on their own. The greatest speedup is achieved in the initial parallelizing step, where the algorithm is simply converted to run in parallel on a GPU. Storing some data in the constant memory provides a slight but reliable speedup for a small amount of work. Additional time can be gained by using shared memory. However, memory transactions did not account for as much of the execution time as was expected, and therefore the memory optimizations only yielded small improvements. Test results showed 1 x N tiling to be mostly non-beneficial for the hardware that was used in this work, but there might have been problems with the implementation.
|
57 |
IMPLEMENTATION OF FILTERING BEAMFORMING ALGORITHMS FOR SONAR DEVICES USING GPUKamali, Shahrokh 27 June 2013 (has links)
Beamforming is a signal processing technique used in sensor arrays to direct signal transmission or reception. Beamformer combines input signals in the array to achieve constructive interference at particular angles (beams) and destructive interference for other angles.
According to the following facts: 1- Beamforming can be computationally intensive, so real-time sonar beamforming
algorithms in sonar devices is important. 2- Parallel computing has become a critical component of computing technology of
the 1990s, and it is likely to have as much impact over the next 20 years as
microprocessors have had over the past 20 [5]. 3- The high-performance computing community has been developing parallel
programs for decades. These programs run on large scale, expensive computers.
Only a few elite applications can justify the use of these expensive computers [2]. 4- GPU computing has the ability of parallel computing and it could be available on
the personal computers. The objective of this thesis is to use Graphics Processing Unit (GPU) as real-time digital beamformer to accelerate the intensive signal processing.
|
58 |
Simulering av rök på GPU : Användning av GPGPU för att simulera rökJalsborn, Erik January 2008 (has links)
<p>Detta examensarbete undersöker en befintilig teknik för att simulera rök med ett partikelsystem. Tekniken utvecklas och implementeras så att beräkningar av partiklars nya positioner sker på både en CPU och en GPU. Arbetet gör undersökningar baserat på tidseffektivitet och visar att simulering av röken sker snabbare, när beräkningarna av partiklars nya positioner görs på GPU’n, istället för CPU’n.</p>
|
59 |
Particle Systems : A Comparison Between Octree-based and Screen Space Particle CollisionKåvemark, Nils, Miloradovic, Stevan January 2018 (has links)
Background. Real-time applications like video games use particle systems for special effects to add visual aesthetics and realism. However, when complex behaviour that requires interaction between particles and geometry is desired, problems arise. The developer has to choose between having consistent and precise collisions or higher performance. This thesis goes over two particle collision implementations that try to solve these problems. Objectives. The objective of this thesis is to create an application that has support for two collision methods and compare them on performance and accuracy to decide which one is more suitable for real-time applications. Methods. To answer the research questions proposed the implementation methodology was used, as a result of that, a 3D-application was created using the graphics API OpenGL to render. Two simple GPGPU implementations were made for each method, to have a more fair comparison. To be able to measure performance the application logs frame-time every frame. A fixed time-step was added in the main loop to allow the users to stop the application at a certain time to be able to capture images of the scene that will then be used for pixel comparison to measure accuracy. Results. Screen space particle collision is almost three times faster than the octree-based method. Both methods had different behavior in both the real-time simulation and at specific time-steps resulting in loss of accuracy from the screen space particle collision. Conclusions. The tests allowed the authors to show that the screen space particle collision is faster and scales better than the octree-based method. However, it did lack precision as shown by the comparison by the images taken from the test. For particle simulations that require consistent and accurate collision checks the octree-based method is better due to the fact that screen space particle collision can result in false collisions checks and has problems with hidden geometry.
|
60 |
Využití GPU výpočtů pro rozpoznání dopravních značekZídek, Karel January 2015 (has links)
The thesis deals with the problem of GPU acceleration of algorithms for traffic sign recognition. Theoretical part of the thesis outlines methods for object detection with emphasis on the traffic sign detection problem. Further, it provides comparison of two well known tools for programming on the GPU: CUDA and OpenCL. On the basis of the review, an architecture of own solution is proposed. Finally, the thesis contains description of the implementation as well as evaluation of the results.
|
Page generated in 0.0437 seconds