Cosmological simulations are used by astronomers to investigate large scale structure formation and galaxy evolution. Structure nding, that is, the discovery of gravitationally-bound objects such as dark matter halos, is a crucial step in many such simulations. During recent years, advancing computational capacity has lead to halo-nders needing to manage increasingly larger simulations. As a result, many multi-core solutions have arisen in an attempt to process these simulations more eciently. However, a many-core approach to the problem using graphics processing units (GPUs) appears largely unexplored. Since these simulations are inherently n-body problems, they contain a high degree of parallelism, which makes them very well suited to a GPU architecture. Therefore, it makes sense to determine the potential for further research in halo-nding algorithms on a GPU. We present a number of modified algorithms, for accelerating the identication of halos and sub-structures, using entry-level graphics hardware. The algorithms are based on an adaptive hierarchical renement of the friends-of-friends (FoF) method using six phase-space dimensions: This allows for robust tracking of sub-structures. These methods are highly amenable to parallel implementation and run on GPUs. We implemented four separate systems; two on GPUs and two on CPUs. The first system for both CPU and GPU was implemented as a proof of concept exercise to familiarise us with the problem: These utilised minimum spanning trees (MSTs) and brute force methods. Our second implementation, for the CPU and GPU, capitalised on knowledge gained from the proof of concept applications, leading us to use kd-trees to efficiently solve the problem.
The CPU implementations were intended to serve as benchmarks for our GPU applications. In order to verify the efficacy of the implemented systems, we applied our halo finders to cosmological simulations of varying size and compared the results obtained to those given by a widely used FoF commercial halo-finder. To conduct a fair comparison, CPU benchmarks were implemented using well-known libraries optimised for these calculations.
The best performing implementation, with minimal optimisation, used kd-trees on the GPU. This achieved a 12x speed-up over our CPU implementation, which used similar methods. The same GPU implementation was compared with a current, widely-used commercial halo finder FoF system, and achieved a 2x speed-up for up to 5 million particles. Results suggest a scalable solution, where speed-up increases with the size of dataset used. We conclude that there is great potential for future research into an optimised kd-tree implementation on graphics hardware for the problem of structure finding in cosmological simulations.
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:uctcs/oai:techreports.cs.uct.ac.za:937 |
Date | 01 June 2014 |
Creators | Wood, Daniel |
Source Sets | South African National ETD Portal |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | pdf http://pubs.cs.uct.ac.za/archive/00000937/01/danielwoodUpdatedThesis.pdf |
Page generated in 0.0055 seconds