Global ETD Search

41	A parallel model for the heterogeneous computation of radio astronomy signal correlation Harris, Christopher John January 2009 (has links) The computational requirements of scientific research are constantly growing. In the field of radio astronomy, observations have evolved from using single telescopes, to interferometer arrays of many telescopes, and there are currently arrays of massive scale under development. These interferometers use signal and image processing to produce data that is useful to radio astronomy, and the amount of processing required scales quadratically with the scale of the array. Traditional computational approaches are unable to meet this demand in the near future. This thesis explores the use of heterogeneous parallel processing to meet the computational demands of radio astronomy. In heterogeneous computing, multiple hardware architectures are used for processing. In this work, the Graphics Processing Unit (GPU) is used as a co-processor along with the Central Processing Unit (CPU) for the computation of signal processing algorithms. Specifically, the suitability of the GPU to accelerate the correlator algorithms used in radio astronomy is investigated. This work first implemented a FX correlator on the GPU, with a performance increase of one to two orders of magnitude over a serial CPU approach. The FX correlator algorithm combines pairs of telescope signals in the Fourier domain. Given N telescope signals from the interferometer array, N2 conjugate multiplications must be calculated in the algorithm. For extremely large arrays (N >> 30), this is a huge computational requirement. Testing will show that the GPU correlator produces results equivalent to that of a software correlator implemented on the CPU. However, the algorithm itself is adapted in order to take advantage of the processing power of the GPU. Research examined how correlator parameters, in particular the number of telescope signals and the Fast Fourier Transform (FFT) length, affected the results. correlator GPU Graphics processing unit Polyphase Filterbank Radio astronomy Heterogeneous Correlation Computing Radio astronomy Interferometry
42	Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis Sanyal, Sourav 01 August 2019 (has links) General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads. Process Variation Near-Threshold Computing General Purpose Graphics Processing Unit Warp Criticality Problem Single Instruction Multiple Data Computer Engineering
43	Využití Vertex a Pixel shaderu v OpenGL pro 3D zobrazení 3D obrazových dat v medicíně / Vertex and Pixel Shaders OpenGL Visualisation of Medical 3D Image Data Vaďura, Jiří January 2009 (has links) This thesis deals with accelerated 3D rendering of medical data, e.g. computed tomography, using a graphics processor and OpenGL library. Raw data slices are send to graphic memory and rendered by a ray-casting algorithm. The goal of this project is high quality visual output and full user interaction at the same time. Multiple rendering modes are avaiable to the user: MIP, X-Ray simulation and realistic shading.
44	System for Collision Detection Between Deformable Models Built on Axis Aligned Bounding Boxes and GPU Based Culling Tuft, David Owen 12 January 2007 (has links) (PDF) Collision detection between deforming models is a difficult problem for collision detection systems to handle. This problem is even more difficult when deformations are unconstrained, objects are in close proximity to one another, and when the entity count is high. We propose a method to perform collision detection between multiple deforming objects with unconstrained deformations that will give good results in close proximities. Currently no systems exist that achieve good performance on both unconstrained triangle level deformations and deformations that preserve edge connectivity. We propose a new system built as a combination of Graphics Processing Unit (GPU) based culling and Axis Aligned Bounding Box (AABB) based culling. Techniques for performing hierarchy-less GPU-based culling are given. We then discuss how and when to switch between GPU-based culling and AABB based techniques. computer science collision detection gpu graphics processing unit axis aligned bounding box deformable models culling Computer Sciences
45	Exploring High Performance SQL Databases with Graphics Processing Units Hordemann, Glen J. 26 November 2013 (has links) No description available. Computer Science GPU Graphics Processing Unit Database SQL SQLite Visualization Data Mining CUDA GPGPU Parallel Computing High Performance Computing NVIDIA
46	High Performance and Scalable Matching and Assembly of Biological Sequences Abu Doleh, Anas 21 December 2016 (has links) No description available. Computer Engineering Bioinformatics bioinformatics sequence similarity indexing graphical processing unit Apache Spark de Bruijn graph de novo assembly metagenomics
47	GPU Parallelization of Astronomical Image Subtraction / GPU-parallelisering av astronomisk bildsubtraction Arneving, Gustav, Wilhelmsson, Hugo January 2024 (has links) Astronomical image subtraction is a method for generating a difference image from two images, which covers the same area but taken at different times, in order to see changes over time. Due to the images being taken at different times, one of the images has to be convolved, to match the atmospheric conditions ofthe other image. HOTPANTS is an open source software used for astronomical image subtraction. The problem is that HOTPANTS is written in serial C and therefore does not scale well with growing image sizes. There have been previous efforts to parallelize HOTPANTS, which include P-HOTPANTS and GBAISP. However, these projects are outdated or unavailable, respectively. The latest effort, BACH, is a reimplementation of HOTPANTS in C++, where the convolution and subtraction parts have been parallelized on a GPU using OpenCL. This thesis project is a continuation of BACH, called X-BACH, which aims to parallelize the remaining parts of the HOTPANTS algorithm using OpenCL. The results show that some parts of the HOTPANTS algorithm, excluding convolution and subtraction, are highly suitable for the GPU while other parts arenot suitable for the GPU. It is believed that some parts which are not suitable forthe GPU are highly suitable for CPU parallelization. Overall, running on an external GPU, X-BACH achieves a relative speed of 1 to 2 compared to BACH, and a relative of 0.8 to 2.5 compared to HOTPANTS. When running on an integrated GPU, X-BACH achieves a relative speed of 0.5 to 1.2 compared to BACH, and a relative speed of 0.3 to 2 compared to HOTPANTS. Some parts of the algorithm achieves a speedup of up to 10 times when parallelized on a GPU. In terms of accuracy, X-BACH generally obtains a maximum relative error in order of magnitude ranging from 10−7 to 10−1. However, on certain test images, the algorithm has been observed to be unstable. GPU Graphics Processing Unit Graphics Card Parallelization Image Processing GPU Grafikprocessor Grafikkort Parallelisering Bildbehandling Computer and Information Sciences Data- och informationsvetenskap
48	Parallel paradigms in optimal structural design Van Huyssteen, Salomon Stephanus 12 1900 (has links) Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision solvers. This is attempted in order to take advantage of the anticipated factor 2 speed increase that single precision computations have over that of double precision computations. However, single precision routines present unpredictable performance characteristics and struggle to converge to required accuracies, which is unfavourable for optimization solvers. Second, QP and dual are statements pitted against one another in a parallel environment. This is done because it is not always easy to see which is best a priori. Therefore both are started in parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP vs. dual statements prove to be very attractive, converging within the minimum number of outer iterations. The most appropriate solver is selected as the problem properties change during the iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling competing routines. Third, multiple global searches are started in parallel on a shared memory system. Problems see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the need for set thread amounts, as in message passing implementations. Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units), proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek: Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om die korrekte akkuraatheid te vind. Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt. Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel. Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin. Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde “threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die “message passing” implementerings. Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in ’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik. Graphics processing unit Numerical optimization Open MP CUDA Dissertations -- Mechatronic engineering Theses -- Mechatronic engineering Computational fluid dynamics Mathematical optimization Parallel computing
49	Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-step Sadahiro, Makoto 04 September 2014 (has links) By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text Graphics Processing Unit Seismic Imaging, Microseismic RTM Reverse Time Migration GPU CUDA Acoustic wave propagation Fortran
50	Computação paralela em GPU para resolução de sistemas de equações algébricas resultantes da aplicação do método de elementos finitos em eletromagnetismo. / Parallel computing on GPU for solving systems of algebraic equations resulting from application of finite element method in electromagnetism. Camargos, Ana Flávia Peixoto de 04 August 2014 (has links) Este trabalho apresenta a aplicação de técnicas de processamento paralelo na resolução de equações algébricas oriundas do Método de Elementos Finitos aplicado ao Eletromagnetismo, nos regimes estático e harmônico. As técnicas de programação paralelas utilizadas foram OpenMP, CUDA e GPUDirect, sendo esta última para as plataformas do tipo Multi-GPU. Os métodos iterativos abordados incluem aqueles do subespaço Krylov: Gradientes Conjugados, Gradientes Biconjugados, Conjugado Residual, Gradientes Biconjugados Estabilizados, Gradientes Conjugados para equações normais (CGNE e CGNR) e Gradientes Conjugados ao Quadrado. Todas as implementações fizeram uso das bibliotecas CUSP, CUSPARSE e CUBLAS. Para problemas estáticos, os seguintes pré-condicionadores foram adotados, todos eles com implementações paralelizadas e executadas na GPU: Decomposições Incompletas LU e de Cholesky, Multigrid Algébrico, Diagonal e Inversa Aproximada. Para os problemas harmônicos, apenas os dois primeiros pré-condicionadores foram utilizados, porém na sua versão sequencial, com execução na CPU, resultando em uma implementação híbrida CPU-GPU. As ferramentas computacionais desenvolvidas foram testadas na simulação de problemas de aterramento elétrico. No caso do regime harmônico, em que o fenômeno é regido pela Equação de Onda completa com perdas e não homogênea, a formulação adotada foi aquela em dois potenciais, A-V aresta-nodal. Em todas as situações, os aplicativos desenvolvidos para GPU apresentaram speedups apreciáveis, demonstrando a potencialidade dessa tecnologia para a simulação de problemas de larga escala na Engenharia Elétrica, com excelente relação custo-benefício. / This work presents the use of parallel processing techniques in Graphics Processing Units (GPU) for the solution of algebraic equations arising from the Finite Element modeling of electromagnetic phenomena, both in steadystate and time-harmonic regime. The techniques used were parallel programming OpenMP, CUDA and GPUDirect, the latter for those platforms of type Multi-GPU. The iterative methods discussed include those of the Krylov subspace: Conjugate Gradients, Bi-conjugate Gradients, Conjugate Residual, Bi-conjugate Gradients Stabilized, Conjugate Gradients for Normal Equations (CGNE and CGNR) and Conjugate Gradients Squared. All implementations have made use of CUSP, CUSPARSE and CUBLAS libraries. For the static problems, the following pre-conditioners were adopted, all with parallelized implementations and executed on the GPU: Incomplete decompositions, both LU and Cholesky, Algebraic Multigrid, Diagonal and Approximate Inverse. For the time-harmonic varying problems, only the first two pre-conditioners were used, but in their sequential version and running in the CPU, which yielded a hybrid CPU-GPU implementation. The developed computational tools were tested in the simulation of electrical grounding systems. In the case of the harmonic regime, in which the phenomenon is governed by the driven, lossy wave equation, the formulation adopted was that in two potential, the ungauged edge A-V formulation. In all cases, the developed GPU-based tools showed considerable speedups, showing that this is a promising technology for the simulation of large-scale Electrical Engineering problems, with excellent cost-benefit. Finte element method Graphic processing unit Iterative solver Método dos elementos finitos Métodos iterativos Paralelismo Parallelism Solving systems of algebraic equations Unidade de processamento gráfico

Search results