171 |
Global Illumination in Real-Time using Voxel Cone Tracing on Mobile Devices / Global illuminering i realtid på mobila enheterWahlén, Conrad January 2016 (has links)
This thesis explores Voxel Cone Tracing as a possible Global Illumination solutionon mobile devices.The rapid increase of performance on low-power graphics processors hasmade a big impact. More advanced computer graphics algorithms are now possi-ble on a new range of devices. One category of such algorithms is Global Illumi-nation, which calculates realistic lighting in rendered scenes. The combinationof advanced graphics and portability is of special interest to implement in newtechnologies like Virtual Reality.The result of this thesis shows that while possible to implement a state of theart Global Illumination algorithm, the performance of mobile Graphics Process-ing Units is still not enough to make it usable in real-time.
|
172 |
Enhancing productivity and performance portability of OpenCL applications on heterogeneous systems using runtime optimizationsLutz, Thibaut January 2015 (has links)
Initially driven by a strong need for increased computational performance in science and engineering, heterogeneous systems have become ubiquitous and they are getting increasingly complex. The single processor era has been replaced with multi-core processors, which have quickly been surrounded by satellite devices aiming to increase the throughput of the entire system. These auxiliary devices, such as Graphics Processing Units, Field Programmable Gate Arrays or other specialized processors have very different architectures. This puts an enormous strain on programming models and software developers to take full advantage of the computing power at hand. Because of this diversity and the unachievable flexibility and portability necessary to optimize for each target individually, heterogeneous systems remain typically vastly under-utilized. In this thesis, we explore two distinct ways to tackle this problem. Providing automated, non intrusive methods in the form of compiler tools and implementing efficient abstractions to automatically tune parameters for a restricted domain are two complementary approaches investigated to better utilize compute resources in heterogeneous systems. First, we explore a fully automated compiler based approach, where a runtime system analyzes the computation flow of an OpenCL application and optimizes it across multiple compute kernels. This method can be deployed on any existing application transparently and replaces significant software engineering effort spent to tune application for a particular system. We show that this technique achieves speedups of up to 3x over unoptimized code and an average of 1.4x over manually optimized code for highly dynamic applications. Second, a library based approach is designed to provide a high level abstraction for complex problems in a specific domain, stencil computation. Using domain specific techniques, the underlying framework optimizes the code aggressively. We show that even in a restricted domain, automatic tuning mechanisms and robust architectural abstraction are necessary to improve performance. Using the abstraction layer, we demonstrate strong scaling of various applications to multiple GPUs with a speedup of up to 1.9x on two GPUs and 3.6x on four.
|
173 |
GPU-Accelerated Contour Extraction on Large Images Using SnakesKienel, Enrico, Brunnett, Guido 16 February 2009 (has links) (PDF)
Active contours have been proven to be a powerful semiautomatic image segmentation approach, that seems to cope with many applications and different image modalities. However, they exhibit inherent drawbacks, including the sensibility to contour initialization due to the limited capture range of image edges and problems with concave boundary regions. The Gradient Vector Flow replaces the traditional image force and provides an enlarged capture range as well as enhanced concavity extraction capabilities, but it involves an expensive computational effort and considerably increased memory requirements at the time of computation. In this paper, we present an enhancement of the active contour model to facilitate semiautomatic contour detection in huge images. We propose a tile-based image decomposition accompanying an image force computation scheme on demand in order to minimize both computational and memory requirements. We show an efficient implementation of this approach on the basis of general purpose GPU processing providing for continuous active contour deformation without a considerable delay.
|
174 |
GPGPU-accelerated nonlinear state estimators : application to MPC-controlled bioreactor performanceRoos, Darren Craig January 2021 (has links)
Practical control problems are subject to dealing with instrumentation noise and inaccurate models. These can be modelled as measurement and state noise, respectively. Nonlinear state estimators, for example a particle filter, can be used to mitigate these effects. However, they are usually computationally expensive which makes them impractical for industrial use. This text investigates using General Purpose Graphics Processing Units (GPGPU) to improve the performance particle and Gaussian sum filters by parallelizing their prediction, update and resampling steps. GPGPU accelerated filters are found to outperform non-accelerated filters as the number of particle increases. GPGPU acceleration also allows particle filters with 2^19.5 particles to be used on systems with dynamic time constants on the order of 0.1 second and for Gaussian sum filters with 2^18.5 particles to be used with time constants on the order of 1 second.
The filters are applied to a bioreactor system containing R. Oryzae, where MPC control is applied to the production phase fumaric acid and glucose concentrations. The bioreactor is modelled using results from Iplik (2017) and Swart (2019). It is found that the GPGPU filters improved run times allow for more particles to be used which provides increased filter accuracy and thus better performance. This improved performance comes at the cost of consuming more energy. Thus, it is believed that the GPGPU implementations should be used for applications with complex dynamics/noise that require large numbers of particles and/or high sampling rates. / Dissertation (MEng (Control Engineering))--University of Pretoria, 2021. / Chemical Engineering / MEng (Control Engineering) / Unrestricted
|
175 |
GPUMap: A Transparently GPU-Accelerated Map FunctionPachev, Ivan 01 March 2017 (has links)
As GPGPU computing becomes more popular, it will be used to tackle a wider range of problems. However, due to the current state of GPGPU programming, programmers are typically required to be familiar with the architecture of the GPU in order to effectively program it. Fortunately, there are software packages that attempt to simplify GPGPU programming in higher-level languages such as Java and Python. However, these software packages do not attempt to abstract the GPU-acceleration process completely. Instead, they require programmers to be somewhat familiar with the traditional GPGPU programming model which involves some understanding of GPU threads and kernels. In addition, prior to using these software packages, programmers are required to transform the data they would like to operate on into arrays of primitive data. Typically, such software packages restrict the use of object-oriented programming when implementing the code to operate on this data. This thesis presents GPUMap, which is a proof-of-concept GPU-accelerated map function for Python. GPUMap aims to hide all the details of the GPU from the programmer, and allows the programmer to accelerate programs written in normal Python code that operate on arbitrarily nested objects using a majority of Python syntax. Using GPUMap, certain types of Python programs are able to be accelerated up to 100 times over normal Python code.
There are also software packages that provide simplified GPU acceleration to distributed computing frameworks such as MapReduce and Spark. Unfortunately, these packages do not provide a completely abstracted GPU programming experience, which conflicts with the purpose of the distributed computing frameworks: to abstract the underlying distributed system. This thesis also presents GPU-accelerated RDD (GPURDD), which is a type of Spark Resilient Distributed Dataset (RDD) which incorporates GPUMap into its map, filter, and foreach methods in order to allow Spark applicatons to make use of the abstracted GPU acceleration provided by GPUMap.
|
176 |
Simulace šíření ultrazvuku v kostech / Simulation of Ultrasound Propagation in BonesKadlubiak, Kristián January 2017 (has links)
It is estimated that mind-boggling 14.1 million new cases of cancer occurred worldwide in 2012 alone. This number is alarming. Although healthy lifestyle may reduce a risk of developing cancer, there is always some probability that cancer would develop even in an absolutely fit individual. There are two main conditions for successful treatment of cancer. Firstly, early diagnostic is absolutely crucial. Secondly, there is a need for suitable surgical methods for affected tissue removal. Ultrasound has a great potential to be used for both purposes as a non-invasive method. Photoacoustic spectroscopy is imaging method for tumor detection of great properties making the use of ultrasound while High-Intensity Focused Ultrasound (HIFU) is non-invasive surgical method. These methods would be impossible without precise ultrasound propagation simulations. The k-Wave is an open source MATLAB toolbox implementing such simulations. So, why are not these methods already deployed in treatment? Unfortunately, the simulation of ultrasound propagation is a very time consuming task, which makes it ineffective for medical purposes. However, there are a few options how to accelerate these simulations. The use of GPU is a very promising way to accelerate simulation. The main topic of this thesis is the acceleration of the simulation of soundwaves propagation in bones and hard tissue. The implementation developed as a part of this thesis was benchmarked on various supercomputers including Anselm in Ostrava and Piz Daint in Lugano. The implemented solution provides remarkable acceleration compared to the original MATLAB prototype. It was able to accelerate the simulation around 160 times in the best case. It means that the simulation, which would otherwise last for 6.5 days, can be now computed in one hour. This acceleration was achieved using an NVIDIA Tesla P100 to run the simulation with the domain size of 416x416x416 grid points. The thesis includes performance benchmarks on different GPUs to provide complex image acceleration capabilities of developed implementation and provides discussion about memory usage and numerical accuracy. Thanks to the implemented solution harnessing the power of modern GPUs, doctors and researchers all around the world have a powerful tool in hands.
|
177 |
Zpracování obrazu s velkými datovými toky - využití CUDA/OpenCL / High data rate image processing using CUDA/OpenCLSedláček, Filip January 2018 (has links)
The main objective of this research is to propose optimization of the defect detection algorithm in the production of nonwoven textile. The algorithm was developed by CAMEA spol. s.r.o. As a consequence of upgrading the current camera system to a more powerful one, it will be necessary to optimize the current algorithm and choose the hardware with the appropriate architecture on which the calculations will be performed. This work will describe a usefull programming techniques of CUDA software architecture and OpenCL framework in details. Using these tools, we proposed to implement a parallel equivalent of the current algorithm, describe various optimization methods, and we designed a GUI to test these methods.
|
178 |
Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU SystemsŠpeťko, Matej January 2018 (has links)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes the performance or the memory capacity of a single GPU is not enough for a scientific application. The application needs to be scaled into multiple GPUs. During the computation there is need for the GPUs to exchange partial results. This communication represents computation overhead. For this reason it is important to research the methods of the effective communication between GPUs. This means less CPU involvement, lower latency, shared system buffers. Inter-node and intra-node communication is examined. The main focus is on GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI.
|
179 |
Akcelerace ultrazvukové neurostimulace pomocí vysokoúrovňových GPGPU knihoven / Acceleration of Ultrasound Neurostimulation Using High-Level GPGPU LibrariesMička, Richard January 2021 (has links)
This thesis explores potential use of GPGPU libraries to accelerate k-Wave toolkit's acoustic wave propagation simulation. Firstly, the thesis researches and assesses available high level GPGPU libraries. Afterwards, an insight into k-Wave toolkit's current state of simulation acceleration is provided. Based on that, an approach to enhance currently available code for processors into a heterogeneous application, that is capable of being run on graphics card, is proposed. The outcome of this thesis is an application that can utilize graphics card. If graphics card is unavailable, a fallback into thread and SIMD based acceleration for processor is executed. The product of this thesis is then evaluated based on its performance, maintenance difficulty and usability.
|
180 |
Paralelizace výpočtů pro zpracování obrazu / Paralelized image processing libraryFuksa, Tomáš January 2011 (has links)
This work deals with parallel computing on modern processors - multi-core CPU and GPU. The goal is to learn about computing on this devices suitable for parallelization, define their advantages and disadvantages, test their properties in examples and select appropriate tools to implement a library for parallel image processing. This library is going to be used for the vanishing point estimation in the path finding mobile robot.
|
Page generated in 0.0262 seconds