• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 138
  • 41
  • 23
  • 16
  • 15
  • 9
  • 8
  • 5
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 303
  • 107
  • 104
  • 104
  • 60
  • 52
  • 50
  • 47
  • 46
  • 39
  • 31
  • 30
  • 30
  • 29
  • 29
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

GPGPU-LOD <em>(General Purpose Graphics Processing Unit - Level Of Detail)</em> : Grafikkortsdriven terräng-LOD-algoritm

Jansson, Karl January 2009 (has links)
<p>Dagens grafikkort är uppbyggda av kraftfulla multiprocessorer som gör dom ypperliga för att hantera parallelliserbara problem som skulle ta lång tid att utföra på en vanlig processor, så som exempelvis level-of-detail eller raytracing.</p><p>Denna rapport presenterar en parallelliserbar level-of-detail algoritm för terränghöjdkartor samt implementerar denna för användning på grafikkort användande Nvidias CUDA API. Algoritmen delar upp den totala höjdkartan i sektioner som ytterligare delas upp i mindre block som beräknas parallellt på grafikkortet. Algoritmen räknar ut vertexpositioner, normaler och texturkoordinater för vardera block och skickar datan till applikationen som skapar vertex och indexbuffertar och renderar sektionerna. Implementationens prestanda och förmåga att reducera trianglar analyseras med två olika sorters culling-metoder; en metod som gallrar trianglar på sektionsnivå och en metod som gallrar på blocknivå.</p><p>Resultaten visar att det är mycket fördelaktigt att låta grafikkortet hantera level-of-detail beräkningar på detta vis även om minneskopiering över grafikkortsbussen är ett problem, då det tar upp ungefär åttiofem procent av den totala tiden för att hantera en sektion. Beräkningarna i sig tar väldigt lite tid och det finns gott om utrymme för utveckling för att uppnå en så bra fördelningen av trianglar över terrängområdet som möjligt.</p>
32

Improving energy efficiency of reliable massively-parallel architectures

Krimer, Evgeni 12 July 2012 (has links)
While transistor size continues to shrink every technology generation increasing the amount of transistors on a die, the reduction in energy consumption is less significant. Furthermore, newer technologies induce fabrication challenges resulting in uncertainties in transistor and wire properties. Therefore to ensure correctness, design margins are introduced resulting in significantly sub-optimal energy efficiency. While increasing parallelism and the use of gating methods contribute to energy consumption reduction, ultimately, more radical changes to the architecture and better integration of architectural and circuit techniques will be necessary. This dissertation explores one such approach, combining a highly-efficient massively-parallel processor architecture with a design methodology that reduces energy by trimming design margins. Using a massively-parallel GPU-like (graphics processing unit) base- line architecture, we discuss the different components of process variation and design microarchitectural approaches supporting efficient margins reduction. We evaluate our design using a cycle-based GPU simulator, describe the conditions where efficiency improvements can be obtained, and explore the benefits of decoupling across a wide range of parameters. We architect a test-chip that was fabricated and show these mechanisms to work. We also discuss why previously developed related approaches fall short when process variation is very large, such as in low-voltage operation or as expected for future VLSI technology. We therefore develop and evaluate a new approach specifically for high-variation scenarios. To summarize, in this work, we address the emerging challenges of modern massively parallel architectures including energy efficient, reliable operation and high process variation. We believe that the results of this work are essential for breaking through the energy wall, continuing to improve the efficiency of future generations of the massively parallel architectures. / text
33

Linking Scheme code to data-parallel CUDA-C code

2013 December 1900 (has links)
In Compute Unified Device Architecture (CUDA), programmers must manage memory operations, synchronization, and utility functions of Central Processing Unit programs that control and issue data-parallel general purpose programs running on a Graphics Processing Unit (GPU). NVIDIA Corporation developed the CUDA framework to enable and develop data-parallel programs for GPUs to accelerate scientific and engineering applications by providing a language extension of C called CUDA-C. A foreign-function interface comprised of Scheme and CUDA-C constructs extends the Gambit Scheme compiler and enables linking of Scheme and data-parallel CUDA-C code to support high-performance parallel computation with reasonably low overhead in runtime. We provide six test cases — implemented both in Scheme and CUDA-C — in order to evaluate performance of our implementation in Gambit and to show 0–35% overhead in the usual case. Our work enables Scheme programmers to develop expressive programs that control and issue data-parallel programs running on GPUs, while also reducing hands-on memory management.
34

Water simulation for cell based sandbox games

Lundell, Christian January 2014 (has links)
This thesis work presents a new algorithm for simulating fluid based on the Navier-Stokes equations. The algorithm is designed for cell based sandbox games where interactivity and performance are the main priorities. The algorithm enforces mass conservation conservatively instead of enforcing a divergence free velocity field. A global scale pressure model that simulates hydrostatic pressure is used where the pressure propagates between neighboring cells. A prefix sum algorithm is used to only compute work areas that contain fluid.
35

Parallel Run-Time Verification

Berkovich, Shay January 2013 (has links)
Run-time verification is a technique to reason about a program correctness. Given a set of desirable properties and a program trace from the inspected program as an input, the monitor module verifies that properties hold on this trace. As this process is taking place at a run time, one of the major drawbacks of run-time verification is the execution overhead caused by a monitoring activity. In this thesis, we intend to minimize this overhead by presenting a collection of parallel verification algorithms. The algorithms verify properties correctness in a parallel fashion, decreasing the verification time by dispersion of computationally intensive calculations over multiple cores (first level of parallelism). We designed the algorithms with the intention to exploit a data-level parallelism, thus specifically suitable to run on Graphics Processing Units (GPUs), although can be utilized on multi-core platforms as well. Running the inspected program and the monitor module on separate platforms (second level of parallelism) results in several advantages: minimization of interference between the monitor and the program, faster processing for non-trivial computations, and even significant reduction in power consumption (when the monitor is running on GPU). This work also aims to provide a solution to automated run-time verification of C programs by implementing the aforementioned set of algorithms in the monitoring tool called GPU-based online and offline Monitoring Framework (GooMF). The ultimate goal of GooMF is to supply developers with an easy-to-use and flexible verification API that requires minimal knowledge of formal languages and techniques.
36

Creating Music Visualizations in a Mandelbrot Set Explorer

Knapp, Christian January 2012 (has links)
The aim of this thesis is to implement a Mandelbrot Set Explorer that includes the functionality to create music visualizations. The Mandelbrot set is an important mathematical object, and the arguably most famous so called fractal. One of its outstanding attributes is its beauty, and therefore there are several implementations that visualize the set and allow it to navigate around it. In this thesis methods are discussed to visualize the set and create music visualizations consisting of zooms into the Mandelbrot set. For that purpose methods for analysing music are implemented, so user created zooms can react to the music that is played. Mainly the thesis deals with problems that occur during the process of developing this application to create music visualizations. Especially problems concerning performance and usability are focused. The thesis will reveal that it is in fact possible to create very aesthetically pleasing music visualizations by using zooms into the Mandelbrot set. The biggest drawback is the lack in performance, because of the high computation effort, and therefore the difficulties in rendering the visualization in real-time.
37

Fast Spheroidal Weathering with Colluvium Deposition

Farley, McKay T. 30 November 2011 (has links) (PDF)
It can be difficult to quickly and easily create realistic sandstone terrain. Film makers often need to generate realistic terrain for establishing the setting of their film. Many methods have been created which address terrain generation. One such method is using heightmaps which encode height as a gray-value in a 2d image. Most terrain generation techniques don't admit concavities such as overhangs and arches. We present an algorithm that operates on a voxel grid for creating 3d terrain. Our algorithm uses curvature estimation to weather away the terrain. We speed up our method using a caching mechanism that stores the curvature estimate. We generate piles of colluvium, the broken away pieces of weathered rock, with a simple deposition algorithm to improve the realism of the terrain. We explore the possibility of generating our sandstone terrain on the GPU using OpenCL. With our algorithm, an artist is able to quickly and easily create 3d terrain with concavities and colluvium.
38

ACCELERATION OF SPIKING NEURAL NETWORK ON GENERAL PURPOSE GRAPHICS PROCESSORS

Han, Bing 05 May 2010 (has links)
No description available.
39

Real Time Crowd Visualization using the GPU

Karthikeyan, Muruganand 17 September 2008 (has links)
Crowd Simulation and Visualization are an important aspect of many applications such as Movies, Games and Virtual Reality simulations. The advantage with crowd rendering in movies is that the entire rendering process can be done off-line. And hence computational power is not much of an overhead. However, applications like Games and Virtual Reality Simulations demand real-time interactivity. The sheer processing power demanded by real time interactivity has, thus far, limited crowd simulations to specialized equipment. In this thesis we try to address the issue of rendering and visualizing a large crowd of animated figures at interactive rates. Recent trends in hardware capabilities and the availability of cheap, commodity graphics cards capable of general purpose computations have achieved immense computational speed up and have paved the way for this solution. We propose a Graphics Processing Unit(GPU) based implementation for animating virtual characters. However, simulation of a large number of human like characters is further complicated by the fact that it needs to be visually convincing to the user. We suggest a motion graph based animation-splicing approach to achieving this sense of realism. / Master of Science
40

Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures

Aji, Ashwin Mandayam 25 August 2008 (has links)
With the emerging hybrid multi-core and many-core compute platforms delivering unprecedented high performance within a single chip, and making rapid strides toward the commodity processor market, they are widely expected to replace the multi-core processors in the existing High-Performance Computing (HPC) infrastructures, such as large scale clusters, grids and supercomputers. On the other hand in the realm of bioinformatics, the size of genomic databases is doubling every 12 months, and hence the need for novel approaches to parallelize sequence search algorithms has become increasingly important. This thesis puts a significant step forward in bridging the gap between software and hardware by presenting an efficient and scalable model to accelerate one of the popular sequence alignment algorithms by exploiting multigrain parallelism that is exposed by the emerging multiprocessor architectures. Specifically, we parallelize a dynamic programming algorithm called Smith-Waterman both within and across multiple Cell Broadband Engines and within an nVIDIA GeForce General Purpose Graphics Processing Unit (GPGPU). Cell Broadband Engine: We parallelize the Smith-Waterman algorithm within a Cell node by performing a blocked data decomposition of the dynamic programming matrix followed by pipelined execution of the blocks across the synergistic processing elements (SPEs) of the Cell. We also introduce novel optimization methods that completely utilize the vector processing power of the SPE. As a result, we achieve near-linear scalability or near-constant efficiency for up to 16 SPEs on the dual-Cell QS20 blades, and our design is highly scalable to more cores, if available. We further extend this design to accelerate the Smith-Waterman algorithm across nodes on both the IBM QS20 and the PlayStation3 Cell cluster platforms and achieve a maximum speedup of 44, when compared to the execution times on a single Cell node. We then introduce an analytical model to accurately estimate the execution times of parallel sequence alignments and wavefront algorithms in general on the Cell cluster platforms. Lastly, we contribute and evaluate TOSS -- a Throughput-Oriented Sequence Scheduler, which leverages the performance prediction model and dynamically partitions the available processing elements to simultaneously align multiple sequences. This scheme succeeds in aligning more sequences per unit time with an improvement of 33.5% over the naive first-come, first-serve (FCFS) scheduler. nVIDIA GPGPU: We parallelize the Smith-Waterman algorithm on the GPGPU by optimizing the code in stages, which include optimal data layout strategies, coalesced memory accesses and blocked data decomposition techniques. Results show that our methods provide a maximum speedup of 3.6 on the nVIDIA GPGPU when compared to the performance of the naive implementation of Smith-Waterman. / Master of Science

Page generated in 0.0259 seconds