Spelling suggestions: "subject:"gpu""
161 |
Simulering av rök på GPU : Användning av GPGPU för att simulera rökJalsborn, Erik January 2008 (has links)
Detta examensarbete undersöker en befintilig teknik för att simulera rök med ett partikelsystem. Tekniken utvecklas och implementeras så att beräkningar av partiklars nya positioner sker på både en CPU och en GPU. Arbetet gör undersökningar baserat på tidseffektivitet och visar att simulering av röken sker snabbare, när beräkningarna av partiklars nya positioner görs på GPU’n, istället för CPU’n.
|
162 |
GPGPU : Bildbehandling på grafikkortHedborg, Johan January 2006 (has links)
GPGPU is a collective term for research involving general computation on graphics cards. A modern graphics card typically provides more than ten times the computational power of an ordinary PC processor. This is a result of the high demands for speed and image quality in computer games. This thesis investigates the possibility of exploiting this computational power for image processing purposes. Three well known methods where implemented on a graphics card: FFT (Fast Fourier Transform), KLT (Kanade Lucas Tomasi point tracking) and the generation of scale pyramids. All algorithms where successfully implemented and they are tree to ten times faster than correspondning optimized CPU implementation.
|
163 |
Blum Blum Shub on the GPUOlsson, Mikael, Gullberg, Niklas January 2012 (has links)
Context. The cryptographically secure pseudo-random number generator Blum Blum Shub (BBS) is a simple algorithm with a strong security proof, however it requires very large numbers to be secure, which makes it computationally heavy. The Graphics Processing Unit (GPU) is a common vector processor originally dedicated to computer-game graphics, but has since been adapted to perform general-purpose computing. The GPU has a large potential for fast general-purpose parallel computing but due to its architecture it is difficult to adapt certain algorithms to utilise the full computational power of the GPU. Objectives. The objective of this thesis was to investigate if an implementation of the BBS pseudo-random number generator algorithm on the GPU would be faster than a CPU implementation. Methods. In this thesis, we modelled the performance of a multi-precision number system with different data types; to decide which data type should be used for a multi-precision number system implementation on the GPU. The multi-precision number system design was based on a positional number system. Because multi-precision numbers were used, conventional methods for arithmetic were not efficient or practical. Therefore, addition was performed by using Lazy Addition that allows larger carry values in order to limit the amount of carry propagation required to perform addition. Carry propagation was done by using a technique derived from a Kogge-Stone carry look-ahead adder. Single-precision multiplication was done using Dekker splits and multi-precision modular multiplication used Montgomery multiplication. Results. Our results showed that using the floating-point data type would yield greater performance for a multi-precision number system on the GPU compared to using the integer data type. The performance results from our GPU bound BBS implementation was about 4 times slower than a CPU version implemented with the GNU Multiple Precision Arithmetic Library (GMP). Conclusions. The conclusion made from this thesis, is that our GPU bound BBS implementation, is not a suitable alternative or replacement for the CPU bound implementation.
|
164 |
A Data-Parallel Graphics Pipeline Implemented in OpenCL / En Data-Parallell Grafikpipeline Implementerad i OpenCLEk, Joel January 2012 (has links)
This report documents implementation details, results, benchmarks and technical discussions for the work carried out within a master’s thesis at Linköping University. Within the master’s thesis, the field of software rendering is explored in the age of parallel computing. Using the Open Computing Language, a complete graphics pipeline was implemented for use on general processing units from different vendors. The pipeline is tile-based, fully-configurable and provides means of rendering visually compelling images in real-time. Yet, further optimizations for parallel architectures are needed as uneven work loads drastically decrease the overall performance of the pipeline.
|
165 |
Advanced Real-time Post-Processing using GPGPU techniquesLönroth, Per, Unger, Mattias January 2008 (has links)
Post-processing techniques are used to change a rendered image as a last step before presentation and include, but is not limited to, operations such as change of saturation or contrast, and also more advanced effects like depth-of-field and tone mapping. Depth-of-field effects are created by changing the focus in an image; the parts close to the focus point are perfectly sharp while the rest of the image has a variable amount of blurriness. The effect is widely used in photography and movies as a depth cue but has in the latest years also been introduced into computer games. Today’s graphics hardware gives new possibilities when it comes to computation capacity. Shaders and GPGPU languages can be used to do massive parallel operations on graphics hardware and are well suited for game developers. This thesis presents the theoretical background of some of the recent and most valuable depth-of-field algorithms and describes the implementation of various solutions in the shader domain but also using GPGPU techniques. The main objective is to analyze various depth-of-field approaches and look at their visual quality and how the methods scale performance wise when using different techniques.
|
166 |
Modelica PARallel benchmark suite (MPAR) - a test suite for evaluating the performance of parallel simulations of Modelica modelsHemmati Moghadam, Afshin January 2011 (has links)
Using the object-oriented, equation-based modeling language Modelica, it is possible to model and simulate computationally intensive models. To reduce the simulation time, a desirable approach is to perform the simulations on parallel multi-core platforms. For this purpose, several works have been carried out so far, the most recent one includes language enhancements with explicit parallel programing language constructs in the algorithmic parts of the Modelica language. This extension automatically generates parallel simulation code for execution on OpenCL-enabled platforms, and it has been implemented in the open-source OpenModelica environment. However, to ensure that this extension as well as future developments regarding parallel simulations of Modelica models are feasible, performing a systematic benchmarking with respect to a set of appropriate Modelica models is essential, which is the main focus of study in this thesis. In this thesis a benchmark test suite containing computationally intensive Modelica models which are relevant for parallel simulations is presented. The suite is used in this thesis as a means for evaluating the feasibility and performance measurements of the generated OpenCL code when using the new Modelica language extension. In addition, several considerations and suggestions on how the modeler can efficiently parallelize sequential models to achieve better performance on OpenCL-enabled GPUs and multi-coreCPUs are also given. The measurements have been done for both sequential and parallel implementations of the benchmark suite using the generated code from the OpenModelica compiler on different hardware configurations including single and multi-core CPUs as well as GPUs. The gained results in this thesis show that simulating Modelica models using OpenCL as a target language is very feasible. In addition, it is concluded that for models with large data sizes and great level of parallelism, it is possible to achieve considerable speedup on GPUs compared to single and multi-core CPUs.
|
167 |
Mikrovågssimulering med realtidsljus : Realtids-ray tracing i CUDAHaggren, Simon January 2010 (has links)
Detta arbete undersöker möjligheterna med att simulera mikrovågor i ett slutet system. Systemet implementeras med en redan befintlig teknik kallad ray tracing. Ray tracing är en ljussättningsteknik som går ut på att simulera fotoners rörelse mellan ljuskälla och betraktare i en miljö man önskar ljussätta, och sedan belysa de områden som blir träffade för att på detta vis rendera en bild. Fotoner och mikrovågor har egenskaper som liknar varandra då de båda är elektromagnetism med olika våglängder. Ray tracing är en krävande algoritm då många uträkningar för varje foton måste utföras varje uppdatering. Därför har algoritmen implementerats med CUDA, ett bibliotek från Nvidia som gör det möjligt att använda GPU:n som ett generellt beräkningssystem. Detta är lämpligt för just den här typen av problem då GPU:ns arkitektur är ämnad för multipla, parallella uträkningar.
|
168 |
Implementing Cauchy Reed-Solomon Utilizing OpenCL / Cauchy Reed-Solomon implementerat med OpenCLKarlsson, Tim January 2013 (has links)
In this paper the performance of executing Cauchy Reed-Solomon (CRS) coding on the GPU is evaluated and compared with execution on the CPU. Three different prototypes are developed. One is sequential and developed in C++, the other two are developed utilizing C++ and OpenCL. The measurements are done by comparing the execution time of different data block sizes ranging from 16KB up to 256MB with two different encoding ratios, 9/12 and 10/16. The measurements are done on an Intel CPU with 4 cores with an integrated graphics card and an AMD graphics card. The OpenCL prototypes are executed with three different targets, the CPU, the integrated graphics card and the AMD graphics card. The sequential prototype is executed on the same CPU, but on a single core. The results from the measurements show that the GPU is faster than the CPU on larger data sizes. The highest measured throughput is achieved with the multithreaded CPU prototypes (OpenCL executed on the CPU) for sizes around 1MB. / I den här rapporten genomförs prestandatester för exekvering av Cauchy Reed-Solomon (CRS) coding på grafikkortet och jämförs med exekvering på en CPU. Tre olika prototyper har utvecklats. En är sekventiell och utvecklad i C++, de två andra är utvecklade i OpenCL och C++. Testerna genomförs genom tidtagning på olika stora datablock, från 16KB upp till 256MB med olika enkodnings ratios, 9/12 och 10/16. CPU:n som används i testerna är en Intel CPU med 4 kärnor, och grafikkorten som används är det integrerade grafikkortet på CPU:n samt ett grafikkort från AMD. OpenCL prototyperna exekveras med tre olika inställningar, CPU för multitrådat, det integrerade Intel grafikkortet och det dedikerade AMD grafikkortet. Den sekventiella prototypen exekveras på samma CPU, men med en kärna. Resultaten från experimenten visar att grafikkorten är snabbare än CPU:n för större datablock. Den prototyp som fick högst genomströmning av data var den multitrådade CPU prototypen för datablock i storleksordningen 1MB.
|
169 |
Accelerating IISPH : A Parallel GPGPU Solution Using CUDAEliasson, André, Franzén, Pontus January 2015 (has links)
Context. Simulating realistic fluid behavior in incompressible fluids for computer graphics has been pioneered with the implicit incompressible smoothed particle hydrodynamics (IISPH) solver. The algorithm converges faster than other incompressible SPH-solvers, but real-time performance (in the perspective of video games, 30 frames per second) is still an issue when the particle count increases. Objectives. This thesis aims at improving the performance of the IISPH-solver by proposing a parallel solution that runs on the GPU using CUDA. The solution should not compromise the physical accuracy of the original solution. Investigated aspects are execution time, memory usage and physical accuracy. Methods. The proposed implementation uses a fine-grained approach where each particle is calculated on a separate thread. It is compared to a sequential and a parallel OpenMP implementation running on the CPU. Results and Conclusions. It is shown that the parallel CUDA solution allow for real-time performance for approximately 19 times the amount of particles than that of the sequential implementation. For approximately 175 000 particles the simulation runs at the constraint of real-time performance, more particles are still considered interactive. The visual result of the proposed implementation deviated slightly from the ones on the CPU.
|
170 |
Simulating Partial Differential Equations using the Explicit Parallelism of ParModelicaThorslund, Gustaf January 2015 (has links)
The Modelica language is a modelling and programming language for modelling cyber-physical systems using equations and algorithms. In this thesis two suggested extensions of the Modelica language are covered. Those are Partial Differential Equations (PDE) and explicit parallelism in algorithmic code. While PDEs are not yet supported by the Modelica language, this thesis presents a framework for solving PDEs using the algorithmic part of the Modelica language, including parallel extensions. Different numerical solvers have been implemented using the explicit parallel constructs suggested for Modelica by the ParModelica language extensions, and implemented as part of OpenModelica. The solvers have been evaluated using different models, and it can be seen how bigger models are suitable for a parallel solver. The intention has been to write a framework suitable for modelling and parallel simulation of PDEs. This work can, however, also be seen as a case study of how to write a custom solver using parallel algorithmic Modelica and how to evaluate the performance of a parallel solver.
|
Page generated in 0.0396 seconds