Global ETD Search

21	Simulation of Modelica Models on the CUDA Architecture Östlund, Per January 2009 (has links) <p>Simulations are very important for many reasons, and finding ways of accelerating simulations are therefore interesting. In this thesis the feasibility of automatically generating simulation code for a limited set of Modelica models that can be executed on NVIDIAs CUDA architecture is studied. The OpenModelica compiler, an open-source Modelica compiler, was for this purpose extended to generate CUDA code.</p><p>This thesis presents an overview of the CUDA architecture, and looks at the problems that need to be solved to generate efficient simulation code for this architecture. Methods of finding parallelism in models that can be used on the highly parallel CUDA architecture are shown, and methods of efficiently using the available memory spaces on the architecture are also presented.</p><p>This thesis shows that it is possible to generate CUDA simulation code for the set of Modelica models that were chosen. It also shows that for models with a large amount of parallelism it is possible to get significant speedups compared with simulation on a normal processor, and a speedup of 4.6 was reached for one of the models used in the thesis. Several suggestions on how the CUDA architecture can be used even more efficiently for Modelica simulations are also given.</p> Modelica OpenModelica CUDA Parallelization Computer science Datavetenskap
22	An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units Fuhry, Martin 10 April 2013 (has links) Computing highly-accurate approximate solutions to partial differential equations (PDEs) requires both a robust numerical method and a powerful machine. We present a parallel implementation of the discontinuous Galerkin (DG) method on graphics processing units (GPUs). In addition to being flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. While GPUs were originally intended to compute and display computer graphics, they have recently become a popular general purpose computing device. These cheap and extremely powerful devices have a massively parallel structure. With the recent addition of double precision floating point number support, GPUs have matured as serious platforms for parallel scientific computing. In this thesis, we present an implementation of the DG method applied to systems of hyperbolic conservation laws in two dimensions on a GPU using NVIDIA’s Compute Unified Device Architecture (CUDA). Numerous computed examples from linear advection to the Euler equations demonstrate the modularity and usefulness of our implementation. Benchmarking our method against a single core, serial implementation of the DG method reveals a speedup of a factor of over fifty times using a USD $500.00 NVIDIA GTX 580. dg gpu numerical PDE CUDA Applied Mathematics
23	An Advanced Volume Raycasting Technique using GPU Stream Processing Mensmann, Jörg, Ropinski,, Timo, Hinrichs, Klaus January 2010 (has links) GPU-based raycasting is the state-of-the-art rendering technique for interactive volume visualization. The ray traversal is usually implemented in a fragment shader, utilizing the hardware in a way that was not originally intended. New programming interfaces for stream processing, such as CUDA, support a more general programming model and the use of additional device features, which are not accessible through traditional shader programming. In this paper we propose a slab-based raycasting technique that is modeled specifically to use these features to accelerate volume rendering. This technique is based on experience gained from comparing fragment shader implementations of basic raycasting to implementations directly translated to CUDA kernels. The comparison covers direct volume rendering with a variety of optional features, e.g., gradient and lighting calculations. Our findings are supported by benchmarks of typical volume visualization scenarios. We conclude that new stream processing models can only gain a small performance advantage when directly porting the basic raycasting algorithm. However, they can be advantageous through novel acceleration methods which use the hardware features not available to shader implementations. Direct volume rendering raycasting stream processing CUDA
24	GPU-Accelerated Real-Time Surveillance De-Weathering Pettersson, Niklas January 2013 (has links) A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any precalibration. Increase in contrast is obtained in all haze/rain/snow-cases while the system lags the maximum of one frame during rain or snow removal. De-hazing can be obtained for any distance to simplify tracking or other operating algorithms on a surveillance system. CUDA de-haze de-rain de-snow image enhancement
25	Faster Dark Matter Calculations Using the GPU Liem, Sebastian January 2011 (has links) We have investigated the use of the graphical processing unit to accelerate the software package DarkSUSY. DarkSUSY is, among other things, used for calculating the dark matter relic density -- an measurable quantity -- given the supersymmetric neutralino, χ, as a dark matter candidate. Supersymmetric theories have many free parameters and we want to calculate the relic density for large areas of the parameter space. The results can then be compared with observations and to constrain the parameters. A faster DarkSUSY would allow for larger searches in the parameter space. We modified DarkSUSY using Nvidia's CUDA platform and wrote a program that, by using the GPU, calculates the χ + χ <-> W+ + W- contribution to the annihilation cross-section. Our initial try was only negligible faster than our non-CUDA program due to under-utilization of the GPU, but solving that the program was 47 times faster than the reference program. We also report on difficulties we faced, both solved and unsolved so the reader can make an informed decision on the worth of rewriting so that the heavy calculations in DarkSUSY use the GPU. / Vi har undersökt om man kan använda grafikkortet för att få mjukvarupaketet DarkSUSY snabbare. DarkSUSY används, bland annat, för att beräkna relikdensiteten av mörk materia -- en mätbar kvantitet -- användandes den supersymmetriska neutralinon, χ, som mörk materia-kandidat. Supersymmetriska teorier har många fria parametrar och vi vill beräkna relikdensiteten för stora områden av parameterrummet. Resultaten kan sedan jämföras med observationer för att begränsa parametrarna. Ett snabbare DarkSUSY skulle tillåta större sökningar i parameterrummet. Vi modifierade DarkSUSY med hjälp av Nvidias CUDA-platform och skrev ett program som, genom att använda grafikkortet, beräknar χ + χ <-> W+ + W- kanalens bidrag till annihilationstvärsnittet. Vårt första försök var bara försumbart snabbare än vårt icke-CUDA program på grund av underanvändning av grafikkortet. Men med det åtgärdat så var programmet 47 gånger snabbare än referensprogrammet. Vi rapporterar också de problem vi stött på, både de vi löste och de vi inte löste. Detta så att läsaren kan avgöra värdet av att omarbeta så att alla de beräkningsintensiva delarna av DarkSUSY använder grafikkortet. dark matter DarkSUSY GPU GPU calculations CUDA
26	A Phase Based Dense Stereo Algorithm Implemented in CUDA Macomber, Brent David 2011 May 1900 (has links) Stereo imaging is routinely used in Simultaneous Localization and Mapping (SLAM) systems for the navigation and control of autonomous spacecraft proximity operations, advanced robotics, and robotic mapping and surveying applications. A key step (and generally the most computationally expensive step) in the generation of high fidelity geometric environment models from image data is the solution of the dense stereo correspondence problem. A novel method for solving the stereo correspondence problem to sub-pixel accuracy in the Fourier frequency domain by exploiting the Convolution Theorem is developed. The method is tailored to challenging aerospace applications by incorporation of correction factors for common error sources. Error-checking metrics verify correspondence matches to ensure high quality depth reconstructions are generated. The effect of geometric foreshortening caused by the baseline displacement of the cameras is modeled and corrected, drastically improving correspondence matching on highly off-normal surfaces. A metric for quantifying the strength of correspondence matches is developed and implemented to recognize and reject weak correspondences, and a separate cross-check verification provides a final defense against erroneous matches. The core components of this phase based dense stereo algorithm are implemented and optimized in the Compute Uni ed Device Architecture (CUDA) parallel computation environment onboard an NVIDIA Graphics Processing Unit (GPU). Accurate dense stereo correspondence matching is performed on stereo image pairs at a rate of nearly 10Hz. Dense Stereo Parallel Computing CUDA SLAM
27	GPGPU-LOD <em>(General Purpose Graphics Processing Unit - Level Of Detail)</em> : Grafikkortsdriven terräng-LOD-algoritm Jansson, Karl January 2009 (has links) <p>Dagens grafikkort är uppbyggda av kraftfulla multiprocessorer som gör dom ypperliga för att hantera parallelliserbara problem som skulle ta lång tid att utföra på en vanlig processor, så som exempelvis level-of-detail eller raytracing.</p><p>Denna rapport presenterar en parallelliserbar level-of-detail algoritm för terränghöjdkartor samt implementerar denna för användning på grafikkort användande Nvidias CUDA API. Algoritmen delar upp den totala höjdkartan i sektioner som ytterligare delas upp i mindre block som beräknas parallellt på grafikkortet. Algoritmen räknar ut vertexpositioner, normaler och texturkoordinater för vardera block och skickar datan till applikationen som skapar vertex och indexbuffertar och renderar sektionerna. Implementationens prestanda och förmåga att reducera trianglar analyseras med två olika sorters culling-metoder; en metod som gallrar trianglar på sektionsnivå och en metod som gallrar på blocknivå.</p><p>Resultaten visar att det är mycket fördelaktigt att låta grafikkortet hantera level-of-detail beräkningar på detta vis även om minneskopiering över grafikkortsbussen är ett problem, då det tar upp ungefär åttiofem procent av den totala tiden för att hantera en sektion. Beräkningarna i sig tar väldigt lite tid och det finns gott om utrymme för utveckling för att uppnå en så bra fördelningen av trianglar över terrängområdet som möjligt.</p> GPGPU LOD CUDA Datorgrafik Computer science Datalogi
28	Μελέτη απόδοσης αλγορίθμων κρυπτογράφησης σε CUDA Μπιλιανού, Παναγιώτα 12 March 2015 (has links) Στην παρούσα διπλωματική εργασία παρουσιάζεται η μελέτη των αλγορίθμων AES και Rijndael καθώς και η υλοποίησή τους με δύο διαϕορετικούς τρόπους, ένας χρησιμοποιώντας εξ’ ολοκλήρου την CPU και άλλος ένας χρησιμοποιώντας τις CPU/GPU με την χρήση της CUDA. Αρχικά, παρουσιάζεται η λογική της σχεδίασης των αλγορίθμων AES και Rijndael καθώς και τα πλεονέκτηματα και μειονεκτήματά τους. Στη συνέχεια, γίνεται μία ανάλυση των διαϕορετικών τρόπων υλοποίησης των αλγορίθμων και παρουσιάζεται ο τρόπος υλοποίησης που επιλέχθηκε, ο electronic codebook, και o λόγος που έγινε αυτή η επιλογή. Τέλος, παρουσιάζονται τα αποτελέσματα και οι πειραματικές μετρήσεις καθώς και τα συμπεράσματα που βγαίνουν αναλύοντας τις γραϕικές παραστάσεις. / In this thesis, algorithms AES and Rijndael are studied and their implementation is presented in two different ways, one way using entirely the CPU and another way using CPU / GPU and CUDA. Initially, the logic behind algorithms AES and Rijndael is presented, as well as their advantages and disadvantages. Consequently, there is an analysis of the different implementations of the algorithms, as well as the reasons behind the selected implementation, the electronic codebook. Finally, the results and experimental measurements are presented and the conclusions according to the graphs' analysis. Αλγόριθμοι Κρυπτογραφία 005.8 AES CUDA Cryptography
29	Linking Scheme code to data-parallel CUDA-C code 2013 December 1900 (has links) In Compute Unified Device Architecture (CUDA), programmers must manage memory operations, synchronization, and utility functions of Central Processing Unit programs that control and issue data-parallel general purpose programs running on a Graphics Processing Unit (GPU). NVIDIA Corporation developed the CUDA framework to enable and develop data-parallel programs for GPUs to accelerate scientific and engineering applications by providing a language extension of C called CUDA-C. A foreign-function interface comprised of Scheme and CUDA-C constructs extends the Gambit Scheme compiler and enables linking of Scheme and data-parallel CUDA-C code to support high-performance parallel computation with reasonably low overhead in runtime. We provide six test cases — implemented both in Scheme and CUDA-C — in order to evaluate performance of our implementation in Gambit and to show 0–35% overhead in the usual case. Our work enables Scheme programmers to develop expressive programs that control and issue data-parallel programs running on GPUs, while also reducing hands-on memory management. Data parallelism GPGPU Scheme Skeletons CUDA Linking
30	An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units Fuhry, Martin 10 April 2013 (has links) Computing highly-accurate approximate solutions to partial differential equations (PDEs) requires both a robust numerical method and a powerful machine. We present a parallel implementation of the discontinuous Galerkin (DG) method on graphics processing units (GPUs). In addition to being flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. While GPUs were originally intended to compute and display computer graphics, they have recently become a popular general purpose computing device. These cheap and extremely powerful devices have a massively parallel structure. With the recent addition of double precision floating point number support, GPUs have matured as serious platforms for parallel scientific computing. In this thesis, we present an implementation of the DG method applied to systems of hyperbolic conservation laws in two dimensions on a GPU using NVIDIA’s Compute Unified Device Architecture (CUDA). Numerous computed examples from linear advection to the Euler equations demonstrate the modularity and usefulness of our implementation. Benchmarking our method against a single core, serial implementation of the DG method reveals a speedup of a factor of over fifty times using a USD $500.00 NVIDIA GTX 580. dg gpu numerical PDE CUDA Applied Mathematics

Search results