31 |
GPGPU-LOD <em>(General Purpose Graphics Processing Unit - Level Of Detail)</em> : Grafikkortsdriven terräng-LOD-algoritmJansson, Karl January 2009 (has links)
<p>Dagens grafikkort är uppbyggda av kraftfulla multiprocessorer som gör dom ypperliga för att hantera parallelliserbara problem som skulle ta lång tid att utföra på en vanlig processor, så som exempelvis level-of-detail eller raytracing.</p><p>Denna rapport presenterar en parallelliserbar level-of-detail algoritm för terränghöjdkartor samt implementerar denna för användning på grafikkort användande Nvidias CUDA API. Algoritmen delar upp den totala höjdkartan i sektioner som ytterligare delas upp i mindre block som beräknas parallellt på grafikkortet. Algoritmen räknar ut vertexpositioner, normaler och texturkoordinater för vardera block och skickar datan till applikationen som skapar vertex och indexbuffertar och renderar sektionerna. Implementationens prestanda och förmåga att reducera trianglar analyseras med två olika sorters culling-metoder; en metod som gallrar trianglar på sektionsnivå och en metod som gallrar på blocknivå.</p><p>Resultaten visar att det är mycket fördelaktigt att låta grafikkortet hantera level-of-detail beräkningar på detta vis även om minneskopiering över grafikkortsbussen är ett problem, då det tar upp ungefär åttiofem procent av den totala tiden för att hantera en sektion. Beräkningarna i sig tar väldigt lite tid och det finns gott om utrymme för utveckling för att uppnå en så bra fördelningen av trianglar över terrängområdet som möjligt.</p>
|
32 |
Improving energy efficiency of reliable massively-parallel architecturesKrimer, Evgeni 12 July 2012 (has links)
While transistor size continues to shrink every technology generation
increasing the amount of transistors on a die, the reduction in energy
consumption is less significant. Furthermore, newer technologies induce
fabrication challenges resulting in uncertainties in transistor and wire properties.
Therefore to ensure correctness, design margins are introduced resulting in
significantly sub-optimal energy efficiency. While increasing parallelism and the
use of gating methods contribute to energy consumption reduction, ultimately,
more radical changes to the architecture and better integration of architectural
and circuit techniques will be necessary. This dissertation explores one such
approach, combining a highly-efficient massively-parallel processor architecture
with a design methodology that reduces energy by trimming design margins.
Using a massively-parallel GPU-like (graphics processing unit) base-
line architecture, we discuss the different components of process variation and
design microarchitectural approaches supporting efficient margins reduction.
We evaluate our design using a cycle-based GPU simulator, describe the
conditions where efficiency improvements can be obtained, and explore the benefits
of decoupling across a wide range of parameters. We architect a test-chip that
was fabricated and show these mechanisms to work.
We also discuss why previously developed related approaches fall short
when process variation is very large, such as in low-voltage operation or as
expected for future VLSI technology. We therefore develop and evaluate a
new approach specifically for high-variation scenarios.
To summarize, in this work, we address the emerging challenges of
modern massively parallel architectures including energy efficient, reliable
operation and high process variation. We believe that the results of this work
are essential for breaking through the energy wall, continuing to improve the
efficiency of future generations of the massively parallel architectures. / text
|
33 |
Linking Scheme code to data-parallel CUDA-C code2013 December 1900 (has links)
In Compute Unified Device Architecture (CUDA), programmers must manage memory operations, synchronization, and utility functions of Central Processing Unit programs that control and issue data-parallel general purpose programs running on a Graphics Processing Unit (GPU). NVIDIA Corporation developed the CUDA framework to enable and develop data-parallel programs for GPUs to accelerate scientific and engineering applications by providing a language extension of C called CUDA-C. A foreign-function interface comprised of Scheme and CUDA-C constructs extends the Gambit Scheme compiler and enables linking of Scheme and data-parallel CUDA-C code to support high-performance parallel computation with reasonably low overhead in runtime. We provide six test cases — implemented both in Scheme and CUDA-C — in order to evaluate performance of our implementation in Gambit and to show 0–35% overhead in the usual case. Our work enables Scheme programmers to develop expressive programs that control and issue data-parallel programs running on GPUs, while also reducing hands-on memory management.
|
34 |
Water simulation for cell based sandbox gamesLundell, Christian January 2014 (has links)
This thesis work presents a new algorithm for simulating fluid based on the Navier-Stokes equations. The algorithm is designed for cell based sandbox games where interactivity and performance are the main priorities. The algorithm enforces mass conservation conservatively instead of enforcing a divergence free velocity field. A global scale pressure model that simulates hydrostatic pressure is used where the pressure propagates between neighboring cells. A prefix sum algorithm is used to only compute work areas that contain fluid.
|
35 |
Parallel Run-Time VerificationBerkovich, Shay January 2013 (has links)
Run-time verification is a technique to reason about a program correctness. Given a set of desirable properties and a program trace from the inspected program as an input, the monitor module verifies that properties hold on this trace. As this process is taking place at a run time, one of the major drawbacks of run-time verification is the execution overhead caused by a monitoring activity. In this thesis, we intend to minimize this overhead by presenting a collection of parallel verification algorithms. The algorithms verify properties correctness in a parallel fashion, decreasing the verification time by dispersion of computationally intensive calculations over multiple cores (first level of parallelism). We designed the algorithms with the intention to exploit a data-level parallelism, thus specifically suitable to run on Graphics Processing Units (GPUs), although can be utilized on multi-core platforms as well. Running the inspected program and the monitor module on separate platforms (second level of parallelism) results in several advantages: minimization of interference between the monitor and the program, faster processing for non-trivial computations, and even significant reduction in power consumption (when the monitor is running on GPU).
This work also aims to provide a solution to automated run-time verification of C programs by implementing the aforementioned set of algorithms in the monitoring tool called GPU-based online and offline Monitoring Framework (GooMF). The ultimate goal of GooMF is to supply developers with an easy-to-use and flexible verification API that requires minimal knowledge of formal languages and techniques.
|
36 |
Creating Music Visualizations in a Mandelbrot Set ExplorerKnapp, Christian January 2012 (has links)
The aim of this thesis is to implement a Mandelbrot Set Explorer that includes the functionality to create music visualizations. The Mandelbrot set is an important mathematical object, and the arguably most famous so called fractal. One of its outstanding attributes is its beauty, and therefore there are several implementations that visualize the set and allow it to navigate around it. In this thesis methods are discussed to visualize the set and create music visualizations consisting of zooms into the Mandelbrot set. For that purpose methods for analysing music are implemented, so user created zooms can react to the music that is played. Mainly the thesis deals with problems that occur during the process of developing this application to create music visualizations. Especially problems concerning performance and usability are focused. The thesis will reveal that it is in fact possible to create very aesthetically pleasing music visualizations by using zooms into the Mandelbrot set. The biggest drawback is the lack in performance, because of the high computation effort, and therefore the difficulties in rendering the visualization in real-time.
|
37 |
Fast Spheroidal Weathering with Colluvium DepositionFarley, McKay T. 30 November 2011 (has links) (PDF)
It can be difficult to quickly and easily create realistic sandstone terrain. Film makers often need to generate realistic terrain for establishing the setting of their film. Many methods have been created which address terrain generation. One such method is using heightmaps which encode height as a gray-value in a 2d image. Most terrain generation techniques don't admit concavities such as overhangs and arches. We present an algorithm that operates on a voxel grid for creating 3d terrain. Our algorithm uses curvature estimation to weather away the terrain. We speed up our method using a caching mechanism that stores the curvature estimate. We generate piles of colluvium, the broken away pieces of weathered rock, with a simple deposition algorithm to improve the realism of the terrain. We explore the possibility of generating our sandstone terrain on the GPU using OpenCL. With our algorithm, an artist is able to quickly and easily create 3d terrain with concavities and colluvium.
|
38 |
ACCELERATION OF SPIKING NEURAL NETWORK ON GENERAL PURPOSE GRAPHICS PROCESSORSHan, Bing 05 May 2010 (has links)
No description available.
|
39 |
In Search of Self-OrganizationArendt, Dustin Lockhart 02 May 2012 (has links)
Many who study complex systems believe that the complexity we observe in the world around us is frequently the product of a large number of interactions between components following a simple rule. However, the task of discerning the rule governing the evolution of any given system is often quite difficult, requiring intuition, guesswork, and a great deal of expertise in that domain. To circumvent this issue, researchers have considered the inverse problem where one searches among many candidate rules to reveal those producing interesting behavior. This approach has its own challenges because the search space grows exponentially and interesting behavior is rare and difficult to rigorously define. Therefore, the contribution of this work includes tools and techniques for searching for dimer automaton rules that exhibit self-organization (the transformation of disorder into structure in the absence of centralized control). Dimer automata are simple, discrete, asynchronous rewriting systems that operate over the edges of an arbitrary graph. Specifically, these contributions include a number of novel, surprising, and useful applications of dimer automata, practical methods for measuring self-organization, advanced techniques for searching for dimer automaton rules, and two efficient GPU parallelizations of dimer automata to make searching and simulation more tractable. / Ph. D.
|
40 |
Real Time Crowd Visualization using the GPUKarthikeyan, Muruganand 17 September 2008 (has links)
Crowd Simulation and Visualization are an important aspect of many applications such as Movies, Games and Virtual Reality simulations. The advantage with crowd rendering in movies is that the entire rendering process can be done off-line. And hence computational power is not much of an overhead. However, applications like Games and Virtual Reality Simulations demand real-time interactivity. The sheer processing power demanded by real time interactivity has, thus far, limited crowd simulations to specialized equipment. In this thesis we try to address the issue of rendering and visualizing a large crowd of animated figures at interactive rates. Recent trends in hardware capabilities and the availability of cheap, commodity graphics cards capable of general purpose computations have achieved immense computational speed up and have paved the way for this solution. We propose a Graphics Processing Unit(GPU) based implementation for animating virtual characters. However, simulation of a large number of human like characters is further complicated by the fact that it needs to be visually convincing to the user. We suggest a motion graph based animation-splicing approach to achieving this sense of realism. / Master of Science
|
Page generated in 0.0285 seconds