• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 475
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 988
  • 321
  • 203
  • 183
  • 168
  • 165
  • 154
  • 138
  • 124
  • 104
  • 96
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
461

Task Parallelism For Ray Tracing On A Gpu Cluster

Unlu, Caglar 01 February 2008 (has links) (PDF)
Ray tracing is a computationally complex global illumination algorithm that is used for producing realistic images. In addition to parallel implementations on commodity PC clusters, recently, Graphics Processing Units (GPU) have also been used to accelerate ray tracing. In this thesis, ray tracing is accelerated on a GPU cluster where the viewing plane is divided into unit tiles. Slave processes work on these tiles in a task parallel manner which are dynamically assigned to them. To decrease the number of ray-triangle intersection tests, Bounding Volume Hierarchies (BVH) are used. It is shown that almost linear speedup can be achieved. On the other hand, it is observed that API and network overheads are obstacles for scalability.
462

Massive Crowd Simulation With Parallel Processing

Yilmaz, Erdal 01 February 2010 (has links) (PDF)
This thesis analyzes how parallel processing with Graphics Processing Unit (GPU) could be used for massive crowd simulation, not only in terms of rendering but also the computational power that is required for realistic simulation. The extreme population in massive crowd simulation introduces an extra computational load, which is quite difficult to meet by using Central Processing Unit (CPU) resources only. The thesis shows the specific methods and approaches that maximize the throughput of GPU parallel computing, while using GPU as the main processor for massive crowd simulation. The methodology introduced in this thesis makes it possible to simulate and visualize hundreds of thousands of virtual characters in real-time. In order to achieve two orders of magnitude speedups by using GPU parallel processing, various stream compaction and effective memory access approaches were employed. To simulate crowd behavior, fuzzy logic functionality on the GPU has been implemented from scratch. This implementation is capable of computing more than half billion fuzzy inferences per second.
463

A Parallel Algorithm For Flight Route Planning On Gpu Using Cuda

Sanci, Seckin 01 May 2010 (has links) (PDF)
Aerial surveillance missions require a geographical region known as the area of interest to be inspected. The route that the aerial reconnaissance vehicle will follow is known as the flight route. Flight route planning operation has to be done before the actual mission is executed. A flight route may consist of hundreds of pre-defined geographical positions called waypoints. The optimal flight route planning manages to find a tour passing through all of the waypoints by covering the minimum possible distance. Due to the combinatorial nature of the problem it is impractical to devise a solution using brute force approaches. This study presents a strategy to find a cost effective and near-optimal solution to the flight route planning problem. The proposed approach is implemented on GPU using CUDA.
464

Implementing method of moments on a GPGPU using Nvidia CUDA

Virk, Bikram 12 April 2010 (has links)
This thesis concentrates on the algorithmic aspects of Method of Moments (MoM) and Locally Corrected Nyström (LCN) numerical methods in electromagnetics. The data dependency in each step of the algorithm is analyzed to implement a parallel version that can harness the powerful processing power of a General Purpose Graphics Processing Unit (GPGPU). The GPGPU programming model provided by NVIDIA's Compute Unified Device Architecture (CUDA) is described to learn the software tools at hand enabling us to implement C code on the GPGPU. Various optimizations such as the partial update at every iteration, inter-block synchronization and using shared memory enable us to achieve an overall speedup of approximately 10. The study also brings out the strengths and weaknesses in implementing different methods such as Crout's LU decomposition and triangular matrix inversion on a GPGPU architecture. The results suggest future directions of study in different algorithms and their effectiveness on a parallel processor environment. The performance data collected show how different features of the GPGPU architecture can be enhanced to yield higher speedup.
465

On the design of architecture-aware algorithms for emerging applications

Kang, Seunghwa 30 January 2011 (has links)
This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.
466

Multi-Resolution Volume Rendering of Large Medical Data Sets on the GPU

Towfeek, Ajden January 2008 (has links)
<p>Volume rendering techniques can be powerful tools when visualizing medical data sets. The characteristics of being able to capture 3-D internal structures make the technique attractive. Scanning equipment is producing medical images, with rapidly increasing resolution, resulting in heavily increased size of the data set. Despite the great amount of processing power CPUs deliver, the required precision in image quality can be hard to obtain in real-time rendering. Therefore, it is highly desirable to optimize the rendering process.</p><p>Modern GPUs possess much more computational power and is available for general purpose programming through high level shading languages. Efficient representations of the data are crucial due to the limited memory provided by the GPU. This thesis describes the theoretical background and the implementation of an approach presented by Patric Ljung, Claes Lundström and Anders Ynnerman at Linköping University. The main objective is to implement a fully working multi-resolution framework with two separate pipelines for pre-processing and real-time rendering, which uses the GPU to visualize large medical data sets.</p>
467

Rendering for Microlithography on GPU Hardware

Iwaniec, Michel January 2008 (has links)
<p>Over the last decades, integrated circuits have changed our everyday lives in a number of ways. Many common devices today taken for granted would not have been possible without this industrial revolution.</p><p>Central to the manufacturing of integrated circuits is the photomask used to expose the wafers. Additionally, such photomasks are also used for manufacturing of flat screen displays. Microlithography, the manufacturing technique of such photomasks, requires complex electronics equipment that excels in both speed and fidelity. Manufacture of such equipment requires competence in virtually all engineering disciplines, where the conversion of geometry into pixels is but one of these. Nevertheless, this single step in the photomask drawing process has a major impact on the throughput and quality of a photomask writer.</p><p>Current high-end semiconductor writers from Micronic use a cluster of Field-Programmable Gate Array circuits (FPGA). FPGAs have for many years been able to replace Application Specific Integrated Circuits due to their flexibility and low initial development cost. For parallel computation, an FPGA can achieve throughput not possible with microprocessors alone. Nevertheless, high-performance FPGAs are expensive devices, and upgrading from one generation to the next often requires a major redesign.</p><p>During the last decade, the computer games industry has taken the lead in parallel computation with graphics card for 3D gaming. While essentially being designed to render 3D polygons and lacking the flexibility of an FPGA, graphics cards have nevertheless started to rival FPGAs as the main workhorse of many parallel computing applications.</p><p>This thesis covers an investigation on utilizing graphics cards for the task of rendering geometry into photomask patterns. It describes different strategies that were tried and the throughput and fidelity achieved with them, along with the problems encountered. It also describes the development of a suitable evaluation framework that was critical to the process.</p>
468

Efficient and Private Processing of Analytical Queries in Scientific Datasets

Kumar, Anand 01 January 2013 (has links)
Large amount of data is generated by applications used in basic-science research and development applications. The size of data introduces great challenges in storage, analysis and preserving privacy. This dissertation proposes novel techniques to efficiently analyze the data and reduce storage space requirements through a data compression technique while preserving privacy and providing data security. We present an efficient technique to compute an analytical query called spatial distance histogram (SDH) using spatiotemporal properties of the data. Special spatiotemporal properties present in the data are exploited to process SDH efficiently on the fly. General purpose graphics processing units (GPGPU or just GPU) are employed to further boost the performance of the algorithm. Size of the data generated in scientific applications poses problems of disk space requirements, input/output (I/O) delays and data transfer bandwidth requirements. These problems are addressed by applying proposed compression technique. We also address the issue of preserving privacy and security in scientific data by proposing a security model. The security model monitors user queries input to the database that stores and manages scientific data. Outputs of user queries are also inspected to detect privacy breach. Privacy policies are enforced by the monitor to allow only those queries and results that satisfy data owner specified policies.
469

An enhanced GPU architecture for not-so-regular parallelism with special implications for database search

Narasiman, Veynu Tupil 27 June 2014 (has links)
Graphics Processing Units (GPUs) have become a popular platform for executing general purpose (i.e., non-graphics) applications. To run efficiently on a GPU, applications must be parallelized into many threads, each of which performs the same task but operates on different data (i.e., data parallelism). Previous work has shown that some applications experience significant speedup when executed on a GPU instead of a CPU. The applications that benefit most tend to have certain characteristics such as high computational intensity, regular control-flow and memory access patterns, and little to no communication among threads. However, not all parallel applications have these characteristics. Applications with a more balanced compute to memory ratio, divergent control flow, irregular memory accesses, and/or frequent communication (i.e., not-so-regular applications) will not take full advantage of the GPU's resources, resulting in performance far short of what could be delivered. The goal of this dissertation is to enhance the GPU architecture to better handle not-so-regular parallelism. This is accomplished in two parts. First, I analyze a diverse set of data parallel applications that suffer from divergent control-flow and/or significant stall time due to memory. I propose two microarchitectural enhancements to the GPU called the Large Warp Microarchitecture and Two-Level Warp Scheduling to address these problems respectively. When combined, these mechanisms increase performance by 19% on average. Second, I examine one of the most important and fundamental applications in computing: database search. Database search is an excellent example of an application that is rich in parallelism, but rife with not-so-regular characteristics. I propose enhancements to the GPU architecture including new instructions that improve intra-warp thread communication and decision making, and also a row-buffer locality hint bit to better handle the irregular memory access patterns of index-based tree search. These proposals improve performance by 21% for full table scans, and 39% for index-based search. The result of this dissertation is an enhanced GPU architecture that better handles not-so-regular parallelism. This increases the scope of applications that run efficiently on the GPU, making it a more viable platform not only for current parallel workloads such as databases, but also for future and emerging parallel applications. / text
470

Simulations of complex atmospheric flows using GPUs - the model ASAMgpu -

Horn, Stefan 26 November 2015 (has links) (PDF)
Die vorliegende Arbeit beschreibt die Entwicklung des hochauflösenden Atmosphärenmodells ASAMgpu. Dabei handelt es sich um ein sogenanntes Grobstrukturmodell bei dem gröbere Strukturen mit typischen Skalen von Deka- bis Kilometern in der atmosphärischen Grenzschicht explizit aufgelöst werden. Hochfrequentere Anteile und deren Dissipation müssen dabei entweder explizit mit einem Turbulenzmodell oder, wie im Falle des beschriebenen Modells, implizit behandelt werden. Dazu wurde der Advektionsoperator mit einem dissipativen Upwind-Verfahren dritter Ordnung diskretisiert. Das Modell beinhaltet ein Zwei-Momenten-Schema zur Beschreibung mikrophysikalischer Prozesse. Ein weiterer wichtiger Aspekt ist die verwendete thermodynamische Variable, die einige Vorteile herkömmlicher Ansätze vereint. Im Falle adiabatischer Prozesse stellt sie eine Erhaltungsgröße dar und die Quellen und Senken im Falle von Phasenumwandlungen sind leicht ableitbar. Außerdem können die benötigten Größen Temperatur und Druck explizit berechnet werden. Das gesamte Modell wurde in C++ implementiert und verwendet OpenGL und die OpenGL Shader Language (GLSL) um die nötigen Berechnungen auf Grafikkarten durchzuführen. Durch diesen Ansatz können genannte Simulationen, für die bisher Supercomputer nötig waren, sehr preisgünstig und energieeffizient durchgeführt werden. Neben der Modellbeschreibung werden die Ergebnisse einiger erfolgreicher Test-Simulationen, darunter drei Fälle mit mariner bewölkter Grenzschicht mit flacher Cumulusbewölkung, vorgestellt.

Page generated in 0.0802 seconds