• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 30
  • 8
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 74
  • 74
  • 74
  • 45
  • 20
  • 20
  • 15
  • 12
  • 12
  • 12
  • 11
  • 11
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Real-time Arbitrary View Rendering From Stereo Video And Time-of-flight Camera

Ates, Tugrul Kagan 01 January 2011 (has links) (PDF)
Generating in-between images from multiple views of a scene is a crucial task for both computer vision and computer graphics fields. Photorealistic rendering, 3DTV and robot navigation are some of many applications which benefit from arbitrary view synthesis, if it is achieved in real-time. Most modern commodity computer architectures include programmable processing chips, called Graphics Processing Units (GPU), which are specialized in rendering computer generated images. These devices excel in achieving high computation power by processing arrays of data in parallel, which make them ideal for real-time computer vision applications. This thesis focuses on an arbitrary view rendering algorithm by using two high resolution color cameras along with a single low resolution time-of-flight depth camera and matching the programming paradigms of the GPUs to achieve real-time processing rates. Proposed method is divided into two stages. Depth estimation through fusion of stereo vision and time-of-flight measurements forms the data acquisition stage and second stage is intermediate view rendering from 3D representations of scenes. Ideas presented are examined in a common experimental framework and practical results attained are put forward. Based on the experimental results, it could be concluded that it is possible to realize content production and display stages of a free-viewpoint system in real-time by using only low cost commodity computing devices.
32

Pricing of American Options by Adaptive Tree Methods on GPUs

Lundgren, Jacob January 2015 (has links)
An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large.
33

A parallel model for the heterogeneous computation of radio astronomy signal correlation

Harris, Christopher John January 2009 (has links)
The computational requirements of scientific research are constantly growing. In the field of radio astronomy, observations have evolved from using single telescopes, to interferometer arrays of many telescopes, and there are currently arrays of massive scale under development. These interferometers use signal and image processing to produce data that is useful to radio astronomy, and the amount of processing required scales quadratically with the scale of the array. Traditional computational approaches are unable to meet this demand in the near future. This thesis explores the use of heterogeneous parallel processing to meet the computational demands of radio astronomy. In heterogeneous computing, multiple hardware architectures are used for processing. In this work, the Graphics Processing Unit (GPU) is used as a co-processor along with the Central Processing Unit (CPU) for the computation of signal processing algorithms. Specifically, the suitability of the GPU to accelerate the correlator algorithms used in radio astronomy is investigated. This work first implemented a FX correlator on the GPU, with a performance increase of one to two orders of magnitude over a serial CPU approach. The FX correlator algorithm combines pairs of telescope signals in the Fourier domain. Given N telescope signals from the interferometer array, N2 conjugate multiplications must be calculated in the algorithm. For extremely large arrays (N >> 30), this is a huge computational requirement. Testing will show that the GPU correlator produces results equivalent to that of a software correlator implemented on the CPU. However, the algorithm itself is adapted in order to take advantage of the processing power of the GPU. Research examined how correlator parameters, in particular the number of telescope signals and the Fast Fourier Transform (FFT) length, affected the results.
34

Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis

Sanyal, Sourav 01 August 2019 (has links)
General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads.
35

Využití Vertex a Pixel shaderu v OpenGL pro 3D zobrazení 3D obrazových dat v medicíně / Vertex and Pixel Shaders OpenGL Visualisation of Medical 3D Image Data

Vaďura, Jiří January 2009 (has links)
This thesis deals with accelerated 3D rendering of medical data, e.g. computed tomography, using a graphics processor and OpenGL library. Raw data slices are send to graphic memory and rendered by a ray-casting algorithm. The goal of this project is high quality visual output and full user interaction at the same time. Multiple rendering modes are avaiable to the user: MIP, X-Ray simulation and realistic shading.
36

System for Collision Detection Between Deformable Models Built on Axis Aligned Bounding Boxes and GPU Based Culling

Tuft, David Owen 12 January 2007 (has links) (PDF)
Collision detection between deforming models is a difficult problem for collision detection systems to handle. This problem is even more difficult when deformations are unconstrained, objects are in close proximity to one another, and when the entity count is high. We propose a method to perform collision detection between multiple deforming objects with unconstrained deformations that will give good results in close proximities. Currently no systems exist that achieve good performance on both unconstrained triangle level deformations and deformations that preserve edge connectivity. We propose a new system built as a combination of Graphics Processing Unit (GPU) based culling and Axis Aligned Bounding Box (AABB) based culling. Techniques for performing hierarchy-less GPU-based culling are given. We then discuss how and when to switch between GPU-based culling and AABB based techniques.
37

Exploring High Performance SQL Databases with Graphics Processing Units

Hordemann, Glen J. 26 November 2013 (has links)
No description available.
38

GPU Parallelization of Astronomical Image Subtraction / GPU-parallelisering av astronomisk bildsubtraction

Arneving, Gustav, Wilhelmsson, Hugo January 2024 (has links)
Astronomical image subtraction is a method for generating a difference image from two images, which covers the same area but taken at different times, in order to see changes over time. Due to the images being taken at different times, one of the images has to be convolved, to match the atmospheric conditions ofthe other image. HOTPANTS is an open source software used for astronomical image subtraction. The problem is that HOTPANTS is written in serial C and therefore does not scale well with growing image sizes. There have been previous efforts to parallelize HOTPANTS, which include P-HOTPANTS and GBAISP. However, these projects are outdated or unavailable, respectively. The latest effort, BACH, is a reimplementation of HOTPANTS in C++, where the convolution and subtraction parts have been parallelized on a GPU using OpenCL. This thesis project is a continuation of BACH, called X-BACH, which aims to parallelize the remaining parts of the HOTPANTS algorithm using OpenCL. The results show that some parts of the HOTPANTS algorithm, excluding convolution and subtraction, are highly suitable for the GPU while other parts arenot suitable for the GPU. It is believed that some parts which are not suitable forthe GPU are highly suitable for CPU parallelization. Overall, running on an external GPU, X-BACH achieves a relative speed of 1 to 2 compared to BACH, and a relative of 0.8 to 2.5 compared to HOTPANTS. When running on an integrated GPU, X-BACH achieves a relative speed of 0.5 to 1.2 compared to BACH, and a relative speed of 0.3 to 2 compared to HOTPANTS. Some parts of the algorithm achieves a speedup of up to 10 times when parallelized on a GPU. In terms of accuracy, X-BACH generally obtains a maximum relative error in order of magnitude ranging from 10−7 to 10−1. However, on certain test images, the algorithm has been observed to be unstable.
39

Parallel paradigms in optimal structural design

Van Huyssteen, Salomon Stephanus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision solvers. This is attempted in order to take advantage of the anticipated factor 2 speed increase that single precision computations have over that of double precision computations. However, single precision routines present unpredictable performance characteristics and struggle to converge to required accuracies, which is unfavourable for optimization solvers. Second, QP and dual are statements pitted against one another in a parallel environment. This is done because it is not always easy to see which is best a priori. Therefore both are started in parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP vs. dual statements prove to be very attractive, converging within the minimum number of outer iterations. The most appropriate solver is selected as the problem properties change during the iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling competing routines. Third, multiple global searches are started in parallel on a shared memory system. Problems see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the need for set thread amounts, as in message passing implementations. Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units), proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek: Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om die korrekte akkuraatheid te vind. Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt. Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel. Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin. Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde “threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die “message passing” implementerings. Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in ’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik.
40

Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-step

Sadahiro, Makoto 04 September 2014 (has links)
By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text

Page generated in 0.0945 seconds