61 |
Optimizing two-dimesional shallow water based flood hydrological model with stream architecturesSarates Junior, Adiel Seffrin January 2015 (has links)
O presente trabalho tem como objetivo explorar as dificuldades bem como os benefícios da utilização de arquiteturas Streams para a simulação de eventos hidrológicos baseados nas equações de águas rasas. Pra tal, é criado embasamento sobre modelagem hidrológica e os algumas classes de modelos existentes, arquiteturas heterogêneas e mais especificamente do modelo bidimensional usado baseado nas equações de Saint-Venan. Com isso é construida a linha de tempo referente às otimizações aplicadas ao modelo inicialmente serial até sua versão otimizada para GPUs, exibindo cada passo tomado em forma de algoritmo para chegar ao objetivo. Com estas otimizações foi obtido um speedup de quatro vezes para pequenas áreas e de 10 vezes com uma resolução média para uma grande área com um alto nível de detalhamento, quando comparado com uma versão de 24 threads. / This study aims to explore the difficulties and the benefits of using Streams architectures for the simulation of hydrological events based on shallow water equations. For this purpose, is created foundation on hydrological modeling and some classes of existing models, heterogeneous architectures, and more specifically the two-dimensional model based on the equations used Saint-Venan. A timeline is constructed relating the applied optimizations beginning from the first serial model optimized for a GPU version showing each step taken in the form of an algorithm to reach the best performance. With these optimizations a speedup about 4 times was obtained for small areas and 10 times with a middle level of detailing for a large area with a high level of detailing. These results were produced comparing the GPU performance with a CPU and 24 threads version.
|
62 |
Optimizing two-dimesional shallow water based flood hydrological model with stream architecturesSarates Junior, Adiel Seffrin January 2015 (has links)
O presente trabalho tem como objetivo explorar as dificuldades bem como os benefícios da utilização de arquiteturas Streams para a simulação de eventos hidrológicos baseados nas equações de águas rasas. Pra tal, é criado embasamento sobre modelagem hidrológica e os algumas classes de modelos existentes, arquiteturas heterogêneas e mais especificamente do modelo bidimensional usado baseado nas equações de Saint-Venan. Com isso é construida a linha de tempo referente às otimizações aplicadas ao modelo inicialmente serial até sua versão otimizada para GPUs, exibindo cada passo tomado em forma de algoritmo para chegar ao objetivo. Com estas otimizações foi obtido um speedup de quatro vezes para pequenas áreas e de 10 vezes com uma resolução média para uma grande área com um alto nível de detalhamento, quando comparado com uma versão de 24 threads. / This study aims to explore the difficulties and the benefits of using Streams architectures for the simulation of hydrological events based on shallow water equations. For this purpose, is created foundation on hydrological modeling and some classes of existing models, heterogeneous architectures, and more specifically the two-dimensional model based on the equations used Saint-Venan. A timeline is constructed relating the applied optimizations beginning from the first serial model optimized for a GPU version showing each step taken in the form of an algorithm to reach the best performance. With these optimizations a speedup about 4 times was obtained for small areas and 10 times with a middle level of detailing for a large area with a high level of detailing. These results were produced comparing the GPU performance with a CPU and 24 threads version.
|
63 |
Performance Evaluation of Boids on the GPU and CPULindqvist, Sebastian January 2018 (has links)
Context. Agent based models are used to simulate complex systems by using multiple agents that follow a set of rules. One such model is the boid model which is used to simulate movements of synchronized groups of animals. Executing agent based models partially or fully on the GPU has previously shown to increase performance, opening up the possibility for larger simulations. However, few articles have previously compared a full GPU implementation of the boid model with a multi-threaded CPU implementation. Objectives. The objectives of this thesis are to find how parallel execution of boid model performs when executed on the CPU and GPU respectively, based on the variables frames per second and average boid computation time per frame. Methods. A performance benchmark experiment will be set up where three implementations of the boid model are implemented and tested. Results. The collected data is summarized in both tables and graphs, showing the result of the experiment for frames per second and average boid computation time per frame. Additionally, the average results are summarized in two tables. Conclusions. For the largest flock size the GPGPU implementation performs the best with an average FPS of 42 times over the single-core implementation while the multi-core implementation performs with an average FPS 6 times better than the single-core implementation. For the smallest flock size the single-core implementation is most efficient while the GPGPU implementation has 1.6 times slower average update time and the multi-cor eimplementation has an average update time of 11 times slower compared to the single-core implementation.
|
64 |
Optimizing two-dimesional shallow water based flood hydrological model with stream architecturesSarates Junior, Adiel Seffrin January 2015 (has links)
O presente trabalho tem como objetivo explorar as dificuldades bem como os benefícios da utilização de arquiteturas Streams para a simulação de eventos hidrológicos baseados nas equações de águas rasas. Pra tal, é criado embasamento sobre modelagem hidrológica e os algumas classes de modelos existentes, arquiteturas heterogêneas e mais especificamente do modelo bidimensional usado baseado nas equações de Saint-Venan. Com isso é construida a linha de tempo referente às otimizações aplicadas ao modelo inicialmente serial até sua versão otimizada para GPUs, exibindo cada passo tomado em forma de algoritmo para chegar ao objetivo. Com estas otimizações foi obtido um speedup de quatro vezes para pequenas áreas e de 10 vezes com uma resolução média para uma grande área com um alto nível de detalhamento, quando comparado com uma versão de 24 threads. / This study aims to explore the difficulties and the benefits of using Streams architectures for the simulation of hydrological events based on shallow water equations. For this purpose, is created foundation on hydrological modeling and some classes of existing models, heterogeneous architectures, and more specifically the two-dimensional model based on the equations used Saint-Venan. A timeline is constructed relating the applied optimizations beginning from the first serial model optimized for a GPU version showing each step taken in the form of an algorithm to reach the best performance. With these optimizations a speedup about 4 times was obtained for small areas and 10 times with a middle level of detailing for a large area with a high level of detailing. These results were produced comparing the GPU performance with a CPU and 24 threads version.
|
65 |
Neural Network on Compute Shader : Running and Training a Neural Network using GPGPUÅström, Fredrik January 2011 (has links)
In this thesis I look into how one can train and run an artificial neural network using Compute Shader and what kind of performance can be expected. An artificial neural network is a computational model that is inspired by biological neural networks, e.g. a brain. Finding what kind of performance can be expected was done by creating an implementation that uses Compute Shader and then compare it to the FANN library, i.e. a fast artificial neural network library written in C. The conclusion is that you can improve performance by training an artificial neural network on the compute shader as long as you are using non-trivial datasets and neural network configurations.
|
66 |
GPGPU-LOD (General Purpose Graphics Processing Unit - Level Of Detail) : Grafikkortsdriven terräng-LOD-algoritmJansson, Karl January 2009 (has links)
Dagens grafikkort är uppbyggda av kraftfulla multiprocessorer som gör dom ypperliga för att hantera parallelliserbara problem som skulle ta lång tid att utföra på en vanlig processor, så som exempelvis level-of-detail eller raytracing. Denna rapport presenterar en parallelliserbar level-of-detail algoritm för terränghöjdkartor samt implementerar denna för användning på grafikkort användande Nvidias CUDA API. Algoritmen delar upp den totala höjdkartan i sektioner som ytterligare delas upp i mindre block som beräknas parallellt på grafikkortet. Algoritmen räknar ut vertexpositioner, normaler och texturkoordinater för vardera block och skickar datan till applikationen som skapar vertex och indexbuffertar och renderar sektionerna. Implementationens prestanda och förmåga att reducera trianglar analyseras med två olika sorters culling-metoder; en metod som gallrar trianglar på sektionsnivå och en metod som gallrar på blocknivå. Resultaten visar att det är mycket fördelaktigt att låta grafikkortet hantera level-of-detail beräkningar på detta vis även om minneskopiering över grafikkortsbussen är ett problem, då det tar upp ungefär åttiofem procent av den totala tiden för att hantera en sektion. Beräkningarna i sig tar väldigt lite tid och det finns gott om utrymme för utveckling för att uppnå en så bra fördelningen av trianglar över terrängområdet som möjligt.
|
67 |
Accelerating the knapsack problem on GPUsSuri, Bharath January 2011 (has links)
The knapsack problem manifests itself in many domains like cryptography, financial domain and bio-informatics. Knapsack problems are often inside optimization loops in system-level design and analysis of embedded systems as well. Given a set of items, each associated with a profit and a weight, the knapsack problem deals with how to choose a subset of items such that the profit is maximized and the total weight of the chosen items is less than the capacity of the knapsack. There exists several variants and extensions of this knapsack problem. In this thesis, we focus on the multiple-choice knapsack problem, where the items are grouped into disjoint classes. However, the multiple-choice knapsack problem is known to be NP-hard. While many different heuristics and approximation schemes have been proposed to solve the problem in polynomial-time, such techniques do not return the optimal solution. A dynamic programming algorithm to solve the problem optimally is known, but has a pseudo-polynomial running time. This leads to high running times of tools in various application domains where knapsack problems must be solved. Many system-level design tools in the embedded systems domain, in particular, would suffer from high running when such a knapsack problem must be solved inside a larger optimization loop. To mitigate the high running times of such algorithms, in this thesis, we propose a GPU-based technique to solve the multiple-choice knapsack problem. We study different approaches to map the dynamic programming algorithm on the GPU and compare their performance in terms of the running times. We employ GPU specific methods to further improve the running times like exploiting the GPU on-chip shared memory. Apart from results on synthetic test-cases, we also demonstrate the applicability of our technique in practice by considering a case-study from system-level design. Towards this, we consider the problem of instruction-set selection for customizable processors.
|
68 |
Terramechanics based wheel-soil model in a computer game enviromentKnutsson, Viktor January 2016 (has links)
This thesis aimed to develop deformable a virtual terrain which a vehicle can move in and interact with in a realistic manner. The theory used to calculate how the terrain influences the vehicle is based on terramechanics. The terrain is divided into two separate parts, one for visualization and one for physical collisions. Deformations of the graphical layer is calculated on the GPU using compute shader programming. The result of the thesis include a tech demo with a small landscape where an alternate terrain vehicle can deform the terrain as it moves around. The method for deforming the graphical layer is made in such a way so that the computational time does not increase as the size of the terrain does, making the method applicable to actual games.
|
69 |
Particle Systems; A GPGPU based approachGrundel, Tobias January 2015 (has links)
Context. Particle systems are an important part for enhancing the player experience in games, simulating fire, smoke and water. The higher and higher demands from modern games to provide visual stunning experiences puts a strain on performance that the CPU is not able to handle. That is why ideas about moving data from the CPU over to the GPU have begun being promoted. Objectives. In this paper we test how the performance is affected when moving a particle system from the CPU to reside completely on the GPU. Methods. This is measured by using time stamps to determine the execution time for different steps needed to inject, update and render the particles in the particle system. Results. The results of these tests showed that the particle system on the GPU performed significantly better than the one on the CPU in all areas except one which was the injection step. Conclusions. This paper concludes that it is better to keep a particle system on the GPU at all times if possible when the particle system is not needed to interact with the rest of the game world.
|
70 |
Using OpenCL to Implement Median Filtering and RSA Algorithms : Two GPGPU Application Case Studies / Att använda OpenCL för att implementera median filtrering och RSA algoritmer : Två tekniska fallstudier inom GPGPUGillsjö, Lukas January 2015 (has links)
Graphics Processing Units (GPU) and their development tools have advanced recently, and industry has become more interested in using them. Among several development frameworks for GPU(s), OpenCL provides a programming environment to write portable code that can run in parallel. This report describes two case studies of algorithm implementations in OpenCL. The first algorithm is Median Filtering which is a widely used image processing algorithm. The other algorithm is RSA which is a popular algorithm used in encryption. The CPU and GPU implementations of these algorithms are compared in method and speed. The GPU implementations are also evaluated by efficiency, stability, scalability and portability. We find that the GPU implementations perform better overall with some exceptions. We see that a pure GPU solution is not always the best and that a hybrid solution with both CPU and GPU may be to prefer in some cases.
|
Page generated in 0.0179 seconds