191 |
Parallel Mesh Adaptation and Graph Analysis Using Graphics Processing UnitsMcguiness, Timothy P 01 January 2011 (has links) (PDF)
In the field of Computational Fluid Dynamics, several types of mesh adaptation strategies are used to enhance a mesh’s quality, thereby improving simulation speed and accuracy. Mesh smoothing (r-refinement) is a simple and effective technique, where nodes are repositioned to increase or decrease local mesh resolution. Mesh partitioning divides a mesh into sections, for use on distributed-memory parallel machines. As a more abstract form of modeling, graph theory can be used to simulate many real-world problems, and has applications in the fields of computer science, sociology, engineering and transportation, to name a few. One of the more important graph analysis tasks involves moving through the graph to evaluate and calculate nodal connectivity. The basic structures of meshes and graphs are the same, as both rely heavily on connectivity information, representing the relationships between constituent nodes and edges. This research examines the parallelization of these algorithms using commodity graphics hardware; a low-cost tool readily available to the computing community. Not only does this research look at the benefits of the fine-grained parallelism of an individual graphics processor, but the use of Message Passing Interface (MPI) on large-scale GPU-based supercomputers is also studied.
|
192 |
Optimizing Harris Corner Detection on GPGPUs Using CUDALoundagin, Justin 01 March 2015 (has links) (PDF)
ABSTRACT
Optimizing Harris Corner Detection on GPGPUs Using CUDA
The objective of this thesis is to optimize the Harris corner detection algorithm implementation on NVIDIA GPGPUs using the CUDA software platform and measure the performance benefit. The Harris corner detection algorithm—developed by C. Harris and M. Stephens—discovers well defined corner points within an image. The corner detection implementation has been proven to be computationally intensive, thus realtime performance is difficult with a sequential software implementation. This thesis decomposes the Harris corner detection algorithm into a set of parallel stages, each of which are implemented and optimized on the CUDA platform. The performance results show that by applying strategic CUDA optimizations to the Harris corner detection implementation, realtime performance is feasible. The optimized CUDA implementation of the Harris corner detection algorithm showed significant speedup over several platforms: standard C, MATLAB, and OpenCV. The optimized CUDA implementation of the Harris corner detection algorithm was then applied to a feature matching computer vision system, which showed significant speedup over the other platforms.
|
193 |
CUDA Accelerated 3D Non-rigid Diffeomorphic Registration / CUDA-accelererad icke-rigid diffeomorf registrering i 3DQu, An January 2017 (has links)
Advances of magnetic resonance imaging (MRI) techniques enable visualguidance to identify the anatomical target of interest during the image guidedintervention(IGI). Non-rigid image registration is one of the crucial techniques,aligning the target tissue with the MRI preoperative image volumes. As thegrowing demand for the real-time interaction in IGI, time used for intraoperativeregistration is increasingly important. This work implements 3D diffeomorphicdemons algorithm on Nvidia GeForce GTX 1070 GPU in C++ based on CUDA8.0.61 programming environment, using which the average registration time hasaccelerated to 5s. We have also extensively evaluated GPU accelerated 3D diffeomorphicregistration against both CPU implementation and Matlab codes, and theresults show that GPU implementation performs a much better algorithm efficiency.
|
194 |
Visual Inspection Of Railroad TracksBabenko, Pavel 01 January 2009 (has links)
In this dissertation, we have developed computer vision methods for measurement of rail gauge, and reliable identification and localization of structural defects in railroad tracks. The rail gauge is the distance between the innermost sides of the two parallel steel rails. We have developed two methods for evaluation of rail gauge. These methods were designed for different hardware setups: the first method works with two pairs of unaligned video cameras while the second method works with depth maps generated by paired laser range scanners. We have also developed a method for detection of rail defects such as damaged or missed rail fasteners, tie clips, and bolts, based on correlation and MACH filters. Lastly, to make our algorithms perform in real-time, we have developed the GPU based library for parallel computation of the above algorithms. Rail gauge is the most important measurement for track maintenance, because deviations in gauge indicate where potential defects may exist. We have developed a vision-based method for rail gauge estimation from a pair of industrial laser range scanners. In this approach, we start with building a 3D panorama of the rail out of a stack of input scans. After the panorama is built, we apply FIR circular filtering and Gaussian smoothing to the panorama buffer to suppress the noise component. In the next step we attempt to segment the rail heads in the panorama buffer. We employ the method which detects railroad crossings or forks in the panorama buffer. If they are not present, we find the rail edge using robust line fit. If they are present we use an alternative way: we predict the rail edge positions using Kalman filter. In the next step, common to both fork/crossings conditions, we find the adjusted positions of rail edges using additional clustering in the vicinity of the edge. We approximate rail head surface by the third degree polynomial and then fit two plane surfaces to find the exact position of the rail edge. Lastly, using rail edge information, we calculate the rail gauge and smooth it with 1D Gaussian filter. We have also developed a vision-based method to estimate the rail gauge from a pair of unaligned high shutter speed calibrated cameras. In this approach, the first step is to accurately detect the rail in each of the two non-overlapping synchronous images from the two cameras installed on the data collection cart by building an edge map, and fitting lines into the edge map using the Hough transform, and detecting persistent edge lines using a history buffer. After railroad track parts are detected, we segment rails out to find rail edges and calculate the rail gauge. We have demonstrated how to apply Computer Vision methods (the correlation filters and MACH filters in particular) to find different types of railroad elements with fixed or similar appearance, like railroad clips, bolts, and rail plates, in real-time. Template-based approaches for object detection (correlation filters) directly compare gray scale image data to a predefined model or template. The drawback of the correlation filters has always been that they are neither scale nor rotation invariant, thus many different filters are needed if either scale or rotation change. The application of many filters cannot be done in real-time. We have succeeded to overcome this difficulty by using the parallel computation technology which is widely available in the GPUs of most advanced graphics cards. We have developed a library, MinGPU, which facilitates the use of GPUs for Computer Vision, and have also developed a MinGPU-based library of several Computer Vision methods, which includes, among others, an implementation of correlation filters on the GPU. We have achieved a true positive rate of 0.98 for fastener detection using implementation of MACH filters on GPU. Besides correlation filters, MinGPU include implementations of Lucas-Kanade Optical Flow, image homographies, edge detectors and discrete filters, image pyramids, morphology operations, and some graphics primitives. We have shown that MinGPU implementation of homographies speeds up execution time approximately 600 times versus C implementation and 8000 times versus Matlab implementation. MinGPU is built upon a reusable core and thus is an easily expandable library. With the help of MinGPU, we have succeeded to make our algorithms work in real-time.
|
195 |
CUDA Enhanced Filtering In a Pipelined Video Processing FrameworkDworaczyk Wiltshire, Austin Aaron 01 June 2013 (has links) (PDF)
The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction of new high definition video formats such as 4K or stereoscopic 3D, the volume of uncompressed frame data is growing ever larger.
Modern CPUs offer performance enhancements for processing digital video through SIMD instructions such as SSE2 or AVX. However, even with these instruction sets, CPUs are limited by their inherently sequential design, and can only operate on a handful of bytes in parallel. Even processors with a multitude of cores only execute on an elementary level of parallelism.
GPUs provide an alternative, massively parallel architecture. GPUs differ from CPUs by providing thousands of throughput-oriented cores, instead of a maximum of tens of generalized “good enough at everything” x86 cores. The GPU’s throughput-oriented cores are far more adept at handling large arrays of pixel data, as many video filtering operations can be performed independently. This computational independence allows for pixel processing to scale across hun- dreds or even thousands of device cores.
This thesis explores the utilization of GPUs for video processing, and evaluates the advantages and caveats of porting the modern video filtering framework, Vapoursynth, over to running entirely on the GPU. Compute heavy GPU-enabled video processing results in up to a 108% speedup over an SSE2-optimized, multithreaded CPU implementation.
|
196 |
GPU High-Performance Framework for PIC-Like Simulation Methods Using the Vulkan® Explicit APIYager, Kolton Jacob 01 March 2021 (has links) (PDF)
Within computational continuum mechanics there exists a large category of simulation methods which operate by tracking Lagrangian particles over an Eulerian background grid. These Lagrangian/Eulerian hybrid methods, descendants of the Particle-In-Cell method (PIC), have proven highly effective at simulating a broad range of materials and mechanics including fluids, solids, granular materials, and plasma. These methods remain an area of active research after several decades, and their applications can be found across scientific, engineering, and entertainment disciplines.
This thesis presents a GPU driven PIC-like simulation framework created using the Vulkan® API. Vulkan is a cross-platform and open-standard explicit API for graphics and GPU compute programming. Compared to its predecessors, Vulkan offers lower overhead, support for host parallelism, and finer grain control over both device resources and scheduling. This thesis harnesses those advantages to create a programmable GPU compute pipeline backed by a Vulkan adaptation of the SPgrid data-structure and multi-buffered particle arrays. The CPU host system works asynchronously with the GPU to maximize utilization of both the host and device. The framework is demonstrated to be capable of supporting Particle-in-Cell like simulation methods, making it viable for GPU acceleration of many Lagrangian particle on Eulerian grid hybrid methods. This novel framework is the first of its kind to be created using Vulkan® and to take advantage of GPU sparse memory features for grid sparsity.
|
197 |
Hardware Accelerated Particle Filter for Lane Detection and Tracking in OpenCLMadduri, Nikhil January 2014 (has links)
A road lane detection and tracking algorithm is developed, especially tailored to run on high-performance heterogeneous hardware like GPUs and FPGAs in autonomous road vehicles. The algorithm was initially developed in C/C++ and was ported to OpenCL which supports computation on heterogeneous hardware.A novel road lane detection algorithm is proposed using random sampling of particles modeled as straight lines. Weights are assigned to these particles based on their location in the gradient image. To improve the computation efficiency of the lane detection algorithm, lane tracking is introduced in the form of a Particle Filter. Creation of the particles in lane detection step and prediction, measurement updates in lane tracking step are computed parellelly on GPU/FPGA using OpenCL code, while the rest of the code runs on a host CPU. The software was tested on two GPUs - NVIDIA GeForce GTX 660 Ti & NVIDIA GeForce GTX 285 and an FPGA - Altera Stratix-V, which gave a computational frame rate of up to 104 Hz, 79 Hz and 27 Hz respectively. The code was tested on video streams from five different datasets with different scenarios of varying lighting conditions on the road, strong shadows and the presence of light to moderate traffic and was found to be robust in all the situations for detecting a single lane. / <p>Validerat; 20140128 (global_studentproject_submitter)</p>
|
198 |
GPGPU microbenchmarking for irregular application optimizationWinans-Pruitt, Dalton R. 09 August 2022 (has links)
Irregular applications, such as unstructured mesh operations, do not easily map onto the typical GPU programming paradigms endorsed by GPU manufacturers, which mostly focus on maximizing concurrency for latency hiding. In this work, we show how alternative techniques focused on latency amortization can be used to control overall latency while requiring less concurrency. We used a custom-built microbenchmarking framework to test several GPU kernels and show how the GPU behaves under relevant workloads. We demonstrate that coalescing is not required for efficacious performance; an uncoalesced access pattern can achieve high bandwidth - even over 80% of the theoretical global memory bandwidth in certain circumstances. We also make other further observations on specific relevant behaviors of GPUs. We hope that this study opens the door for further investigation into techniques that can exploit latency amortization when latency hiding does not achieve sufficient performance.
|
199 |
Optimization of American option pricing through GPU computing / Optimering av prissättning av amerikanska optioner genom GPU-beräkningarGreinsmark, Hadar, Lindström, Erik January 2017 (has links)
Over the last decades the market for financial derivatives has grown dramatically to values of global importance. With the digital automation of the markets, programs able to efficiently value financial derivatives has become key to market competitiveness and thus garnered considerable interest. This report explores the potential efficiency gains of employing modern technology in GPU computing to price financial options, using the binomial option pricing model. The model is implemented using both CPU and GPU hardware and results compared in terms of computational efficiency. According to this thesis, GPU computing can considerably improve option pricing runtimes. / Under de senaste decennierna har marknaden för finansiella derivatinstrument vuxit till värden av global betydelse. Med ökande digitalisering av marknaden har program som effektivt kan värdera derivatinstrument blivit avgörande för konkurrenskraft och därför givits avsevärt intresse. Denna rapport utforskar vilka möjliga ökningar i effektivitet som kan nås genom att använda modern teknik för GPU-beräkningar för att värdera finansiella optioner genom den binomiala optionsvärderingsmodellen. Modellen implementeras både med CPU-, och GPU-hårdvara och resultaten jämförs i termer av beräkningseffektivitet. Enligt denna studie kan GPU-beräkingar avsevärt förbättra körtider för optionsvärderingar.
|
200 |
Procedurell generering av volymetrisk terräng på olika beräkningsenheter / Procedural generation of volumetric terrain on different processing unitsMathiason, Jesper January 2016 (has links)
Detta arbete undersöker om de existerande algoritmerna Marching cubes och Perlin noise kan användas för att procedurellt generera terräng. Implementationen av dessa algoritmer genererar en terräng som representeras som en tredimensionell volym, för att lösa problem som kan uppkomma när terrängen representeras av ett tvådimensionellt höjdfält. Vidare parallelliseras kombinationen av dessa algoritmer och anpassas för körning på GPU, där experiment visade att parallelliseringen gav prestandaökning och således kortare genereringstider. / <p>Det finns övrigt digitalt material (t.ex. film-, bild- eller ljudfiler) eller modeller/artefakter tillhörande examensarbetet som ska skickas till arkivet.</p><p>There are other digital material (eg film, image or audio files) or models/artifacts that belongs to the thesis and need to be archived.</p>
|
Page generated in 0.0214 seconds