• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 8
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 73
  • 73
  • 73
  • 44
  • 20
  • 19
  • 15
  • 12
  • 12
  • 12
  • 11
  • 11
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Real-time Arbitrary View Rendering From Stereo Video And Time-of-flight Camera

Ates, Tugrul Kagan 01 January 2011 (has links) (PDF)
Generating in-between images from multiple views of a scene is a crucial task for both computer vision and computer graphics fields. Photorealistic rendering, 3DTV and robot navigation are some of many applications which benefit from arbitrary view synthesis, if it is achieved in real-time. Most modern commodity computer architectures include programmable processing chips, called Graphics Processing Units (GPU), which are specialized in rendering computer generated images. These devices excel in achieving high computation power by processing arrays of data in parallel, which make them ideal for real-time computer vision applications. This thesis focuses on an arbitrary view rendering algorithm by using two high resolution color cameras along with a single low resolution time-of-flight depth camera and matching the programming paradigms of the GPUs to achieve real-time processing rates. Proposed method is divided into two stages. Depth estimation through fusion of stereo vision and time-of-flight measurements forms the data acquisition stage and second stage is intermediate view rendering from 3D representations of scenes. Ideas presented are examined in a common experimental framework and practical results attained are put forward. Based on the experimental results, it could be concluded that it is possible to realize content production and display stages of a free-viewpoint system in real-time by using only low cost commodity computing devices.
32

Pricing of American Options by Adaptive Tree Methods on GPUs

Lundgren, Jacob January 2015 (has links)
An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large.
33

A parallel model for the heterogeneous computation of radio astronomy signal correlation

Harris, Christopher John January 2009 (has links)
The computational requirements of scientific research are constantly growing. In the field of radio astronomy, observations have evolved from using single telescopes, to interferometer arrays of many telescopes, and there are currently arrays of massive scale under development. These interferometers use signal and image processing to produce data that is useful to radio astronomy, and the amount of processing required scales quadratically with the scale of the array. Traditional computational approaches are unable to meet this demand in the near future. This thesis explores the use of heterogeneous parallel processing to meet the computational demands of radio astronomy. In heterogeneous computing, multiple hardware architectures are used for processing. In this work, the Graphics Processing Unit (GPU) is used as a co-processor along with the Central Processing Unit (CPU) for the computation of signal processing algorithms. Specifically, the suitability of the GPU to accelerate the correlator algorithms used in radio astronomy is investigated. This work first implemented a FX correlator on the GPU, with a performance increase of one to two orders of magnitude over a serial CPU approach. The FX correlator algorithm combines pairs of telescope signals in the Fourier domain. Given N telescope signals from the interferometer array, N2 conjugate multiplications must be calculated in the algorithm. For extremely large arrays (N >> 30), this is a huge computational requirement. Testing will show that the GPU correlator produces results equivalent to that of a software correlator implemented on the CPU. However, the algorithm itself is adapted in order to take advantage of the processing power of the GPU. Research examined how correlator parameters, in particular the number of telescope signals and the Fast Fourier Transform (FFT) length, affected the results.
34

Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis

Sanyal, Sourav 01 August 2019 (has links)
General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads.
35

Využití Vertex a Pixel shaderu v OpenGL pro 3D zobrazení 3D obrazových dat v medicíně / Vertex and Pixel Shaders OpenGL Visualisation of Medical 3D Image Data

Vaďura, Jiří January 2009 (has links)
This thesis deals with accelerated 3D rendering of medical data, e.g. computed tomography, using a graphics processor and OpenGL library. Raw data slices are send to graphic memory and rendered by a ray-casting algorithm. The goal of this project is high quality visual output and full user interaction at the same time. Multiple rendering modes are avaiable to the user: MIP, X-Ray simulation and realistic shading.
36

System for Collision Detection Between Deformable Models Built on Axis Aligned Bounding Boxes and GPU Based Culling

Tuft, David Owen 12 January 2007 (has links) (PDF)
Collision detection between deforming models is a difficult problem for collision detection systems to handle. This problem is even more difficult when deformations are unconstrained, objects are in close proximity to one another, and when the entity count is high. We propose a method to perform collision detection between multiple deforming objects with unconstrained deformations that will give good results in close proximities. Currently no systems exist that achieve good performance on both unconstrained triangle level deformations and deformations that preserve edge connectivity. We propose a new system built as a combination of Graphics Processing Unit (GPU) based culling and Axis Aligned Bounding Box (AABB) based culling. Techniques for performing hierarchy-less GPU-based culling are given. We then discuss how and when to switch between GPU-based culling and AABB based techniques.
37

Exploring High Performance SQL Databases with Graphics Processing Units

Hordemann, Glen J. 26 November 2013 (has links)
No description available.
38

Parallel paradigms in optimal structural design

Van Huyssteen, Salomon Stephanus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision solvers. This is attempted in order to take advantage of the anticipated factor 2 speed increase that single precision computations have over that of double precision computations. However, single precision routines present unpredictable performance characteristics and struggle to converge to required accuracies, which is unfavourable for optimization solvers. Second, QP and dual are statements pitted against one another in a parallel environment. This is done because it is not always easy to see which is best a priori. Therefore both are started in parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP vs. dual statements prove to be very attractive, converging within the minimum number of outer iterations. The most appropriate solver is selected as the problem properties change during the iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling competing routines. Third, multiple global searches are started in parallel on a shared memory system. Problems see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the need for set thread amounts, as in message passing implementations. Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units), proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek: Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om die korrekte akkuraatheid te vind. Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt. Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel. Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin. Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde “threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die “message passing” implementerings. Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in ’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik.
39

Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-step

Sadahiro, Makoto 04 September 2014 (has links)
By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text
40

Navegação de robôs móveis utilizando visão estéreo / Mobile robot navigation using stereo vision

Mendes, Caio César Teodoro 26 April 2012 (has links)
Navegação autônoma é um tópico abrangente cuja atenção por parte da comunidade de robôs móveis vemaumentando ao longo dos anos. O problema consiste em guiar um robô de forma inteligente por um determinado percurso sem ajuda humana. Esta dissertação apresenta um sistema de navegação para ambientes abertos baseado em visão estéreo. Uma câmera estéreo é utilizada na captação de imagens do ambiente e, utilizando o mapa de disparidades gerado por um método estéreo semi-global, dois métodos de detecção de obstáculos são utilizando para segmentar as imagens em regiões navegáveis e não navegáveis. Posteriormente esta classificação é utilizada em conjunto com um método de desvio de obstáculos, resultando em um sistema completo de navegação autônoma. Os resultados obtidos por está dissertação incluem a avaliação de dois métodos estéreo, esta sendo favorável ao método estéreo empregado (semi-global). Foram feitos testes visando avaliar a qualidade e custo computacional de dois métodos para detecção de obstáculos, um baseado em plano e outro baseado em cone. Tais testes deixaram claras as limitações de ambos os métodos e levaram a uma implementação paralela do método baseado em cone. Utilizando uma unidade de processamento gráfico, a versão paralelizada do método baseado em cone atingiu um ganho no tempo computacional de aproximadamente dez vezes. Por fim, os resultados demonstrarão o sistema completo em funcionamento, onde a plataforma robótica utilizada, um veículo elétrico, foi capaz de desviar de pessoas e cones alcançando seu objetivo seguramente / Autonomous navigation is a broad topic that has received increasing attention from the community of mobile robots over the years. The problem is to guide a robot in a smart way for a certain route without human help. This dissertation presents a navigation system for open environments based on stereo vision. A stereo camera is used to capture images of the environment and based on the disparity map generated by a semi-global stereo method, two obstacle detection methods are used to segment the images into navigable and non-navigable regions. Subsequently, this classification is employed in conjunction with a obstacle avoidance method, resulting in a complete autonomous navigation system. The results include an evaluation two stereo methods, this being favorable to the employed stereo method (semi-global). Tests were performed to evaluate the quality and computational cost of two methods for obstacle detection, a plane based one and a cone based. Such tests have left clear the limitations of both methods and led to a parallel implementation of the cone based method. Using a graphics processing unit, a parallel version of the cone based method reached a gain in computational time of approximately ten times. Finally, the results demonstrate the complete system in operation, where the robotic platform used, an electric vehicle, was able to dodge people and cones reaching its goal safely

Page generated in 0.0734 seconds