• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Genetic programming and cellular automata for fast flood modelling on multi-core CPU and many-core GPU computers

Gibson, Michael John January 2015 (has links)
Many complex systems in nature are governed by simple local interactions, although a number are also described by global interactions. For example, within the field of hydraulics the Navier-Stokes equations describe free-surface water flow, through means of the global preservation of water volume, momentum and energy. However, solving such partial differential equations (PDEs) is computationally expensive when applied to large 2D flow problems. An alternative which reduces the computational complexity, is to use a local derivative to approximate the PDEs, such as finite difference methods, or Cellular Automata (CA). The high speed processing of such simulations is important to modern scientific investigation especially within urban flood modelling, as urban expansion continues to increase the number of impervious areas that need to be modelled. Large numbers of model runs or large spatial or temporal resolution simulations are required in order to investigate, for example, climate change, early warning systems, and sewer design optimisation. The recent introduction of the Graphics Processor Unit (GPU) as a general purpose computing device (General Purpose Graphical Processor Unit, GPGPU) allows this hardware to be used for the accelerated processing of such locally driven simulations. A novel CA transformation for use with GPUs is proposed here to make maximum use of the GPU hardware. CA models are defined by the local state transition rules, which are used in every cell in parallel, and provide an excellent platform for a comparative study of possible alternative state transition rules. Writing local state transition rules for CA systems is a difficult task for humans due to the number and complexity of possible interactions, and is known as the ‘inverse problem’ for CA. Therefore, the use of Genetic Programming (GP) algorithms for the automatic development of state transition rules from example data is also investigated in this thesis. GP is investigated as it is capable of searching the intractably large areas of possible state transition rules, and producing near optimal solutions. However, such population-based optimisation algorithms are limited by the cost of many repeated evaluations of the fitness function, which in this case requires the comparison of a CA simulation to given target data. Therefore, the use of GPGPU hardware for the accelerated learning of local rules is also developed. Speed-up factors of up to 50 times over serial Central Processing Unit (CPU) processing are achieved on simple CA, up to 5-10 times speedup over the fully parallel CPU for the learning of urban flood modelling rules. Furthermore, it is shown GP can generate rules which perform competitively when compared with human formulated rules. This is achieved with generalisation to unseen terrains using similar input conditions and different spatial/temporal resolutions in this important application domain.
2

Otimização de multidões em jogos digitais utilizando CUDA

Bardella, Tiago Ungaro 19 October 2015 (has links)
Made available in DSpace on 2016-03-15T19:38:03Z (GMT). No. of bitstreams: 1 TIAGO UNGARO BARDELLA.pdf: 2553991 bytes, checksum: f8e6ba33f7c930ee81f6b64116f495ff (MD5) Previous issue date: 2015-10-19 / The history of digital games shows, since the beginning, games which uses many types of enemy models to confront and many types of characters to control, like Real-Time Strategy games, for example. These huge amount of models into an important scene are called crowds. The crowds needs a high computer performance and specific algorithms in their interaction control to avoid immersion loss into a game by problems which may happen if the crowds are not treated accordingly. With the popularization of graphic board languages like NVIDIA CUDA, new algorithms were created to easily increase the performance of crowds in digital games and their overwhelming superiority compared to the methods used in linear programming were proved in many researches. The goal of this work is to use these GPU techniques as base to implement a new API using CUDA language that will present better performance and simplicity compared to the others algorithms on the area of crowds in digital games. After the project conclusion, the created API turned easier the crowd treatment to digital game developers using Unity3D integrated with API TBX, that now only need to include a DLL in the project instead creating na algorithm for crowd treatment from the beginning, which takes a huge amount of time from development. / O histórico dos jogos digitais apresenta, desde seu princípio, jogos que utilizam diversos modelos de inimigos para enfrentar ou diversos modelos de personagens para controlar, como os jogos Real-Time Strategy por exemplo. Essas grandes quantidades de modelos que compõem uma cena importante são chamadas de multidões. As multidões necessitam de um alto poder computacional e algoritmos específicos para seu tratamento para evitar a perda de imersão dentro de um jogo pelos problemas que podem acontecer caso as multidões não sejam tratadas adequadamente. Com o surgimento de linguagens de placas gráficas como a NVIDIA CUDA, novos algoritmos foram criados para melhor trabalhar com o desempenho de multidões em jogos digitais e sua superioridade em comparação com os métodos utilizados em programação sequencial foi comprovada em diversos estudos. O objetivo deste trabalho é se basear nestas técnicas de GPU para implementar uma nova API usando tecnologia CUDA que visa melhorar os algoritmos existentes para tratamento de multidões em jogos digitais em termos de desempenho e simplicidade de implementação. Com a conclusão do projeto, a API criada facilitou o tratamento de multidões para desenvolvedores de jogos digitais com a game engine Unity3D integrada com a API TBX de simulação de multidões, que agora apenas precisam incluir uma DLL em seu projeto ao invés de criar um algoritmo próprio de tratamento de multidões do início, o que demanda tempo de desenvolvimento.
3

Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines

Ramashekar, Thejas 10 1900 (has links) (PDF)
Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU. A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired. One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature. In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling.

Page generated in 0.077 seconds