Spelling suggestions: "subject:"aprocessing unit"" "subject:"eprocessing unit""
41 |
Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point AnalysisSanyal, Sourav 01 August 2019 (has links)
General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads.
|
42 |
Využití Vertex a Pixel shaderu v OpenGL pro 3D zobrazení 3D obrazových dat v medicíně / Vertex and Pixel Shaders OpenGL Visualisation of Medical 3D Image DataVaďura, Jiří January 2009 (has links)
This thesis deals with accelerated 3D rendering of medical data, e.g. computed tomography, using a graphics processor and OpenGL library. Raw data slices are send to graphic memory and rendered by a ray-casting algorithm. The goal of this project is high quality visual output and full user interaction at the same time. Multiple rendering modes are avaiable to the user: MIP, X-Ray simulation and realistic shading.
|
43 |
System for Collision Detection Between Deformable Models Built on Axis Aligned Bounding Boxes and GPU Based CullingTuft, David Owen 12 January 2007 (has links) (PDF)
Collision detection between deforming models is a difficult problem for collision detection systems to handle. This problem is even more difficult when deformations are unconstrained, objects are in close proximity to one another, and when the entity count is high. We propose a method to perform collision detection between multiple deforming objects with unconstrained deformations that will give good results in close proximities. Currently no systems exist that achieve good performance on both unconstrained triangle level deformations and deformations that preserve edge connectivity. We propose a new system built as a combination of Graphics Processing Unit (GPU) based culling and Axis Aligned Bounding Box (AABB) based culling. Techniques for performing hierarchy-less GPU-based culling are given. We then discuss how and when to switch between GPU-based culling and AABB based techniques.
|
44 |
Exploring High Performance SQL Databases with Graphics Processing UnitsHordemann, Glen J. 26 November 2013 (has links)
No description available.
|
45 |
High Performance and Scalable Matching and Assembly of Biological SequencesAbu Doleh, Anas 21 December 2016 (has links)
No description available.
|
46 |
GPU Parallelization of Astronomical Image Subtraction / GPU-parallelisering av astronomisk bildsubtractionArneving, Gustav, Wilhelmsson, Hugo January 2024 (has links)
Astronomical image subtraction is a method for generating a difference image from two images, which covers the same area but taken at different times, in order to see changes over time. Due to the images being taken at different times, one of the images has to be convolved, to match the atmospheric conditions ofthe other image. HOTPANTS is an open source software used for astronomical image subtraction. The problem is that HOTPANTS is written in serial C and therefore does not scale well with growing image sizes. There have been previous efforts to parallelize HOTPANTS, which include P-HOTPANTS and GBAISP. However, these projects are outdated or unavailable, respectively. The latest effort, BACH, is a reimplementation of HOTPANTS in C++, where the convolution and subtraction parts have been parallelized on a GPU using OpenCL. This thesis project is a continuation of BACH, called X-BACH, which aims to parallelize the remaining parts of the HOTPANTS algorithm using OpenCL. The results show that some parts of the HOTPANTS algorithm, excluding convolution and subtraction, are highly suitable for the GPU while other parts arenot suitable for the GPU. It is believed that some parts which are not suitable forthe GPU are highly suitable for CPU parallelization. Overall, running on an external GPU, X-BACH achieves a relative speed of 1 to 2 compared to BACH, and a relative of 0.8 to 2.5 compared to HOTPANTS. When running on an integrated GPU, X-BACH achieves a relative speed of 0.5 to 1.2 compared to BACH, and a relative speed of 0.3 to 2 compared to HOTPANTS. Some parts of the algorithm achieves a speedup of up to 10 times when parallelized on a GPU. In terms of accuracy, X-BACH generally obtains a maximum relative error in order of magnitude ranging from 10−7 to 10−1. However, on certain test images, the algorithm has been observed to be unstable.
|
47 |
Parallel paradigms in optimal structural designVan Huyssteen, Salomon Stephanus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency
scaling, parallel processing is increasingly being used to decrease computation time. In
this thesis, several parallel paradigms are used to improve the performance of commonly serial
SAO programs. Four novelties are discussed:
First, replacing double precision solvers with single precision solvers. This is attempted in order
to take advantage of the anticipated factor 2 speed increase that single precision computations
have over that of double precision computations. However, single precision routines present
unpredictable performance characteristics and struggle to converge to required accuracies, which
is unfavourable for optimization solvers.
Second, QP and dual are statements pitted against one another in a parallel environment. This
is done because it is not always easy to see which is best a priori. Therefore both are started in
parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP
vs. dual statements prove to be very attractive, converging within the minimum number of outer
iterations. The most appropriate solver is selected as the problem properties change during the
iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at
appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling
competing routines.
Third, multiple global searches are started in parallel on a shared memory system. Problems
see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the
need for set thread amounts, as in message passing implementations.
Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS
routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units),
proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are
recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die
verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd
te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van
algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek:
Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om
voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel
presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om
die korrekte akkuraatheid te vind.
Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd
voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel
begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt.
Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle
gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie
gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering
hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte
te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel.
Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin.
Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde
“threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die
“message passing” implementerings.
Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met
geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in
’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se
vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik.
|
48 |
Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-stepSadahiro, Makoto 04 September 2014 (has links)
By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text
|
49 |
Computação paralela em GPU para resolução de sistemas de equações algébricas resultantes da aplicação do método de elementos finitos em eletromagnetismo. / Parallel computing on GPU for solving systems of algebraic equations resulting from application of finite element method in electromagnetism.Camargos, Ana Flávia Peixoto de 04 August 2014 (has links)
Este trabalho apresenta a aplicação de técnicas de processamento paralelo na resolução de equações algébricas oriundas do Método de Elementos Finitos aplicado ao Eletromagnetismo, nos regimes estático e harmônico. As técnicas de programação paralelas utilizadas foram OpenMP, CUDA e GPUDirect, sendo esta última para as plataformas do tipo Multi-GPU. Os métodos iterativos abordados incluem aqueles do subespaço Krylov: Gradientes Conjugados, Gradientes Biconjugados, Conjugado Residual, Gradientes Biconjugados Estabilizados, Gradientes Conjugados para equações normais (CGNE e CGNR) e Gradientes Conjugados ao Quadrado. Todas as implementações fizeram uso das bibliotecas CUSP, CUSPARSE e CUBLAS. Para problemas estáticos, os seguintes pré-condicionadores foram adotados, todos eles com implementações paralelizadas e executadas na GPU: Decomposições Incompletas LU e de Cholesky, Multigrid Algébrico, Diagonal e Inversa Aproximada. Para os problemas harmônicos, apenas os dois primeiros pré-condicionadores foram utilizados, porém na sua versão sequencial, com execução na CPU, resultando em uma implementação híbrida CPU-GPU. As ferramentas computacionais desenvolvidas foram testadas na simulação de problemas de aterramento elétrico. No caso do regime harmônico, em que o fenômeno é regido pela Equação de Onda completa com perdas e não homogênea, a formulação adotada foi aquela em dois potenciais, A-V aresta-nodal. Em todas as situações, os aplicativos desenvolvidos para GPU apresentaram speedups apreciáveis, demonstrando a potencialidade dessa tecnologia para a simulação de problemas de larga escala na Engenharia Elétrica, com excelente relação custo-benefício. / This work presents the use of parallel processing techniques in Graphics Processing Units (GPU) for the solution of algebraic equations arising from the Finite Element modeling of electromagnetic phenomena, both in steadystate and time-harmonic regime. The techniques used were parallel programming OpenMP, CUDA and GPUDirect, the latter for those platforms of type Multi-GPU. The iterative methods discussed include those of the Krylov subspace: Conjugate Gradients, Bi-conjugate Gradients, Conjugate Residual, Bi-conjugate Gradients Stabilized, Conjugate Gradients for Normal Equations (CGNE and CGNR) and Conjugate Gradients Squared. All implementations have made use of CUSP, CUSPARSE and CUBLAS libraries. For the static problems, the following pre-conditioners were adopted, all with parallelized implementations and executed on the GPU: Incomplete decompositions, both LU and Cholesky, Algebraic Multigrid, Diagonal and Approximate Inverse. For the time-harmonic varying problems, only the first two pre-conditioners were used, but in their sequential version and running in the CPU, which yielded a hybrid CPU-GPU implementation. The developed computational tools were tested in the simulation of electrical grounding systems. In the case of the harmonic regime, in which the phenomenon is governed by the driven, lossy wave equation, the formulation adopted was that in two potential, the ungauged edge A-V formulation. In all cases, the developed GPU-based tools showed considerable speedups, showing that this is a promising technology for the simulation of large-scale Electrical Engineering problems, with excellent cost-benefit.
|
50 |
Navegação de robôs móveis utilizando visão estéreo / Mobile robot navigation using stereo visionMendes, Caio César Teodoro 26 April 2012 (has links)
Navegação autônoma é um tópico abrangente cuja atenção por parte da comunidade de robôs móveis vemaumentando ao longo dos anos. O problema consiste em guiar um robô de forma inteligente por um determinado percurso sem ajuda humana. Esta dissertação apresenta um sistema de navegação para ambientes abertos baseado em visão estéreo. Uma câmera estéreo é utilizada na captação de imagens do ambiente e, utilizando o mapa de disparidades gerado por um método estéreo semi-global, dois métodos de detecção de obstáculos são utilizando para segmentar as imagens em regiões navegáveis e não navegáveis. Posteriormente esta classificação é utilizada em conjunto com um método de desvio de obstáculos, resultando em um sistema completo de navegação autônoma. Os resultados obtidos por está dissertação incluem a avaliação de dois métodos estéreo, esta sendo favorável ao método estéreo empregado (semi-global). Foram feitos testes visando avaliar a qualidade e custo computacional de dois métodos para detecção de obstáculos, um baseado em plano e outro baseado em cone. Tais testes deixaram claras as limitações de ambos os métodos e levaram a uma implementação paralela do método baseado em cone. Utilizando uma unidade de processamento gráfico, a versão paralelizada do método baseado em cone atingiu um ganho no tempo computacional de aproximadamente dez vezes. Por fim, os resultados demonstrarão o sistema completo em funcionamento, onde a plataforma robótica utilizada, um veículo elétrico, foi capaz de desviar de pessoas e cones alcançando seu objetivo seguramente / Autonomous navigation is a broad topic that has received increasing attention from the community of mobile robots over the years. The problem is to guide a robot in a smart way for a certain route without human help. This dissertation presents a navigation system for open environments based on stereo vision. A stereo camera is used to capture images of the environment and based on the disparity map generated by a semi-global stereo method, two obstacle detection methods are used to segment the images into navigable and non-navigable regions. Subsequently, this classification is employed in conjunction with a obstacle avoidance method, resulting in a complete autonomous navigation system. The results include an evaluation two stereo methods, this being favorable to the employed stereo method (semi-global). Tests were performed to evaluate the quality and computational cost of two methods for obstacle detection, a plane based one and a cone based. Such tests have left clear the limitations of both methods and led to a parallel implementation of the cone based method. Using a graphics processing unit, a parallel version of the cone based method reached a gain in computational time of approximately ten times. Finally, the results demonstrate the complete system in operation, where the robotic platform used, an electric vehicle, was able to dodge people and cones reaching its goal safely
|
Page generated in 0.0621 seconds