Over the past 15 years, modern PC graphics cards (GPUs) have changed from being pure graphics accelerators into parallel computing platforms.Several new parallel programming languages have emerged, including NVIDIA's parallel programming language for GPUs (CUDA). This report explores two related problems in parallel: How well-suited is CUDA for implementing algorithms that utilize non-trivial data structures?And, how does one develop a complex algorithm that uses a CUDA system efficiently? A guide for how to implement complex algorithms in CUDA is presented. Simulation of a dense 2D particle system is chosen as the problem domain foralgorithm optimization. Two algorithmic optimization strategies are presented which reduce the computational workload when simulating theparticle system. The strategies can either be used independently, or combined for slightly improved results. Finally, the resultingimplementations are benchmarked against a simpler implementation on a normal PC processor (CPU) as well as a simpler GPU-algorithm. A simple GPU solution is shown to run at least 10 times faster than a simple CPU solution. An improved GPU solution can thenyield another 10 times speed-up, while sacrificing some accuracy.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-118776 |
Date | January 2015 |
Creators | Kalms, Mikael |
Publisher | Linköpings universitet, Informationskodning, Linköpings universitet, Tekniska fakulteten |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0017 seconds