Global ETD Search

1	SAT Solver akcelerovaný pomocí GPU / GPU Accelerated SAT Solver Izrael, Petr January 2013 (has links) This thesis is concerned with design and implementation of a complete SAT solver accelerated on GPU. The achitecture of modern graphics cards is described as well as the CUDA platform and a list of common algorithms used for solving the boolean satisfiability problem (the SAT problem). The presented solution is based on the 3-SAT DC alogirthm, which belongs to the family of well-known DPLL based algorithms. This work describes problems encountered during the design and implementation. The resulting application was then analyzed and optimized. The presented solver cannot compete with state of the art solvers, but proves that it can be up to 21x faster than an equivalent sequential version. Unfortunately, the current implementation can only handle formulas of a limited size. Suggestions on further improvements are given in final sections. DPLL; GPU; GPGPU; CUDA; SAT
2	Streaming Ray Tracer na GPU / Streaming Ray Tracer on GPU Dvořák, Jakub January 2008 (has links) Current consumer GPUs can be used as high performance stream processors and are a tempting platform to be used to implement raytracing. In this paper I briefly present raytracing principles and methods used to accelerate it, modern GPUs programmable pipeline and examples of its use. I describe stream processing in general and available interfaces enabling the usage of GPU as stream processor. Then I present my GPU raytracer implementation, used algorithms and experiments I have made.
3	Ray-traced radiative transfer on massively threaded architectures Thomson, Samuel Paul January 2018 (has links) In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the grace software library. This is combined with an extant radiative transfer solver to produce a new package, taranis. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler integration step, and is manifestly photon-conserving. To my knowledge, taranis is the first ray-traced radiative transfer code to run on graphics processing units and target cosmological-scale smooth particle hydrodynamics (SPH) datasets. A significant optimization effort is undertaken in developing grace. Contrary to typical results in computer graphics, it is found that the bounding volume hierarchies (BVHs) used to accelerate the ray tracing procedure need not be of high quality; as a result, extremely fast BVH construction times are possible (< 0.02 microseconds per particle in an SPH dataset). I show that this exceeds the performance researchers might expect from CPU codes by at least an order of magnitude, and compares favourably to a state-of-the-art ray tracing solution. Similar results are found for the ray-tracing itself, where again techniques from computer graphics are examined for effectiveness with SPH datasets, and new optimizations proposed. For high per-source ray counts (≳ 104), grace can reduce ray tracing run times by up to two orders of magnitude compared to extant CPU solutions developed within the astrophysics community, and by a factor of a few compared to a state-of-the-art solution. taranis is shown to produce expected results in a suite of de facto cosmological radiative transfer tests cases. For some cases, it currently out-performs a serial, CPU-based alternative by a factor of a few. Unfortunately, for the most realistic test its performance is extremely poor, making the current taranis code unsuitable for cosmological radiative transfer. The primary reason for this failing is found to be a small minority of particles which always dominate the timestep criteria. Several plausible routes to mitigate this problem, while retaining parallelism, are put forward.
4	Resolução de um problema térmico inverso utilizando processamento paralelo em arquiteturas de memória compartilhada / Resolution of an inverse thermal problem using parallel processing on shared memory architectures Ansoni, Jonas Laerte 03 September 2010 (has links) A programação paralela tem sido freqüentemente adotada para o desenvolvimento de aplicações que demandam alto desempenho computacional. Com o advento das arquiteturas multi-cores e a existência de diversos níveis de paralelismo é importante definir estratégias de programação paralela que tirem proveito desse poder de processamento nessas arquiteturas. Neste contexto, este trabalho busca avaliar o desempenho da utilização das arquiteturas multi-cores, principalmente o oferecido pelas unidades de processamento gráfico (GPUs) e CPUs multi-cores na resolução de um problema térmico inverso. Algoritmos paralelos para a GPU e CPU foram desenvolvidos utilizando respectivamente as ferramentas de programação em arquiteturas de memória compartilhada NVIDIA CUDA (Compute Unified Device Architecture) e a API POSIX Threads. O algoritmo do método do gradiente conjugado pré-condicionado para resolução de sistemas lineares esparsos foi implementado totalmente no espaço da memória global da GPU em CUDA. O algoritmo desenvolvido foi avaliado em dois modelos de GPU, os quais se mostraram mais eficientes, apresentando um speedup de quatro vezes que a versão serial do algoritmo. A aplicação paralela em POSIX Threads foi avaliada em diferentes CPUs multi-cores com distintas microarquiteturas. Buscando um maior desempenho do código paralelizado foram utilizados flags de otimização as quais se mostraram muito eficientes na aplicação desenvolvida. Desta forma o código paralelizado com o auxílio das flags de otimização chegou a apresentar tempos de processamento cerca de doze vezes mais rápido que a versão serial no mesmo processador sem nenhum tipo de otimização. Assim tanto a abordagem utilizando a GPU como um co-processador genérico a CPU como a aplicação paralela empregando as CPUs multi-cores mostraram-se ferramentas eficientes para a resolução do problema térmico inverso. / Parallel programming has been frequently adopted for the development of applications that demand high-performance computing. With the advent of multi-cores architectures and the existence of several levels of parallelism are important to define programming strategies that take advantage of parallel processing power in these architectures. In this context, this study aims to evaluate the performance of architectures using multi-cores, mainly those offered by the graphics processing units (GPUs) and CPU multi-cores in the resolution of an inverse thermal problem. Parallel algorithms for the GPU and CPU were developed respectively, using the programming tools in shared memory architectures, NVIDIA CUDA (Compute Unified Device Architecture) and the POSIX Threads API. The algorithm of the preconditioned conjugate gradient method for solving sparse linear systems entirely within the global memory of the GPU was implemented by CUDA. It evaluated the two models of GPU, which proved more efficient by having a speedup was four times faster than the serial version of the algorithm. The parallel application in POSIX Threads was evaluated in different multi-core CPU with different microarchitectures. Optimization flags were used to achieve a higher performance of the parallelized code. As those were efficient in the developed application, the parallelized code presented processing times about twelve times faster than the serial version on the same processor without any optimization. Thus both the approach using GPU as a coprocessor to the CPU as a generic parallel application using the multi-core CPU proved to be more efficient tools for solving the inverse thermal problem. GPGPU CUDA GPGPU CUDA Gradiente conjugado pré-condicionado Matriz esparsa Parallel processing POSIX threads POSIX threads Processamento paralelo Sparse numerical solver
5	Resolução de um problema térmico inverso utilizando processamento paralelo em arquiteturas de memória compartilhada / Resolution of an inverse thermal problem using parallel processing on shared memory architectures Jonas Laerte Ansoni 03 September 2010 (has links) A programação paralela tem sido freqüentemente adotada para o desenvolvimento de aplicações que demandam alto desempenho computacional. Com o advento das arquiteturas multi-cores e a existência de diversos níveis de paralelismo é importante definir estratégias de programação paralela que tirem proveito desse poder de processamento nessas arquiteturas. Neste contexto, este trabalho busca avaliar o desempenho da utilização das arquiteturas multi-cores, principalmente o oferecido pelas unidades de processamento gráfico (GPUs) e CPUs multi-cores na resolução de um problema térmico inverso. Algoritmos paralelos para a GPU e CPU foram desenvolvidos utilizando respectivamente as ferramentas de programação em arquiteturas de memória compartilhada NVIDIA CUDA (Compute Unified Device Architecture) e a API POSIX Threads. O algoritmo do método do gradiente conjugado pré-condicionado para resolução de sistemas lineares esparsos foi implementado totalmente no espaço da memória global da GPU em CUDA. O algoritmo desenvolvido foi avaliado em dois modelos de GPU, os quais se mostraram mais eficientes, apresentando um speedup de quatro vezes que a versão serial do algoritmo. A aplicação paralela em POSIX Threads foi avaliada em diferentes CPUs multi-cores com distintas microarquiteturas. Buscando um maior desempenho do código paralelizado foram utilizados flags de otimização as quais se mostraram muito eficientes na aplicação desenvolvida. Desta forma o código paralelizado com o auxílio das flags de otimização chegou a apresentar tempos de processamento cerca de doze vezes mais rápido que a versão serial no mesmo processador sem nenhum tipo de otimização. Assim tanto a abordagem utilizando a GPU como um co-processador genérico a CPU como a aplicação paralela empregando as CPUs multi-cores mostraram-se ferramentas eficientes para a resolução do problema térmico inverso. / Parallel programming has been frequently adopted for the development of applications that demand high-performance computing. With the advent of multi-cores architectures and the existence of several levels of parallelism are important to define programming strategies that take advantage of parallel processing power in these architectures. In this context, this study aims to evaluate the performance of architectures using multi-cores, mainly those offered by the graphics processing units (GPUs) and CPU multi-cores in the resolution of an inverse thermal problem. Parallel algorithms for the GPU and CPU were developed respectively, using the programming tools in shared memory architectures, NVIDIA CUDA (Compute Unified Device Architecture) and the POSIX Threads API. The algorithm of the preconditioned conjugate gradient method for solving sparse linear systems entirely within the global memory of the GPU was implemented by CUDA. It evaluated the two models of GPU, which proved more efficient by having a speedup was four times faster than the serial version of the algorithm. The parallel application in POSIX Threads was evaluated in different multi-core CPU with different microarchitectures. Optimization flags were used to achieve a higher performance of the parallelized code. As those were efficient in the developed application, the parallelized code presented processing times about twelve times faster than the serial version on the same processor without any optimization. Thus both the approach using GPU as a coprocessor to the CPU as a generic parallel application using the multi-core CPU proved to be more efficient tools for solving the inverse thermal problem. GPGPU CUDA Gradiente conjugado pré-condicionado Matriz esparsa POSIX threads Processamento paralelo GPGPU CUDA Parallel processing POSIX threads Sparse numerical solver
6	Simulace šíření ultrazvuku v kostech / Simulation of Ultrasound Propagation in Bones Kadlubiak, Kristián January 2017 (has links) It is estimated that mind-boggling 14.1 million new cases of cancer occurred worldwide in 2012 alone. This number is alarming. Although healthy lifestyle may reduce a risk of developing cancer, there is always some probability that cancer would develop even in an absolutely fit individual. There are two main conditions for successful treatment of cancer. Firstly, early diagnostic is absolutely crucial. Secondly, there is a need for suitable surgical methods for affected tissue removal. Ultrasound has a great potential to be used for both purposes as a non-invasive method. Photoacoustic spectroscopy is imaging method for tumor detection of great properties making the use of ultrasound while High-Intensity Focused Ultrasound (HIFU) is non-invasive surgical method. These methods would be impossible without precise ultrasound propagation simulations. The k-Wave is an open source MATLAB toolbox implementing such simulations. So, why are not these methods already deployed in treatment? Unfortunately, the simulation of ultrasound propagation is a very time consuming task, which makes it ineffective for medical purposes. However, there are a few options how to accelerate these simulations. The use of GPU is a very promising way to accelerate simulation. The main topic of this thesis is the acceleration of the simulation of soundwaves propagation in bones and hard tissue. The implementation developed as a part of this thesis was benchmarked on various supercomputers including Anselm in Ostrava and Piz Daint in Lugano. The implemented solution provides remarkable acceleration compared to the original MATLAB prototype. It was able to accelerate the simulation around 160 times in the best case. It means that the simulation, which would otherwise last for 6.5 days, can be now computed in one hour. This acceleration was achieved using an NVIDIA Tesla P100 to run the simulation with the domain size of 416x416x416 grid points. The thesis includes performance benchmarks on different GPUs to provide complex image acceleration capabilities of developed implementation and provides discussion about memory usage and numerical accuracy. Thanks to the implemented solution harnessing the power of modern GPUs, doctors and researchers all around the world have a powerful tool in hands.
7	Akcelerace částicových rojů PSO pomocí GPU / Acceleration of Particle Swarm Optimization Using GPUs Krézek, Vladimír January 2012 (has links) This work deals with the PSO technique (Particle Swarm Optimization), which is capable to solve complex problems. This technique can be used for solving complex combinatorial problems (the traveling salesman problem, the tasks of knapsack), design of integrated circuits and antennas, in fields such as biomedicine, robotics, artificial intelligence or finance. Although the PSO algorithm is very efficient, the time required to seek out appropriate solutions for real problems often makes the task intractable. The goal of this work is to accelerate the execution time of this algorithm by the usage of Graphics processors (GPU), which offers higher computing potential while preserving the favorable price and size. The boolean satisfiability problem (SAT) was chosen to verify and benchmark the implementation. As the SAT problem belongs to the class of the NP-complete problems, any reduction of the solution time may broaden the class of tractable problems and bring us new interesting knowledge.
8	Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applications Schmitt, Felix, Dietrich, Robert, Juckeland, Guido 29 October 2019 (has links) The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing hybrid MPICUDA programs for properties based on wait states, such as the critical path, a metric proven to identify application bottlenecks effectively. We developed a tool to construct a dependency graph based on an execution trace and the inherent dependencies of the programming models CUDA and Message Passing Interface (MPI). Thereafter, it detects wait states and attributes blame to responsible activities. Together with the property of being on the critical path, we can identify activities that are most viable for optimization. To evaluate the global impact of optimizations to critical activities, we predict the program execution using a graph-based performance projection. The developed approach has been demonstrated with suitable examples to be both scalable and correct. Furthermore, we establish a new categorization of CUDA inefficiency patterns ensuing from the dependencies between CUDA activities. info:eu-repo/classification/ddc/004 ddc:004
9	Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU Systems Špeťko, Matej January 2018 (has links) After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.

Search results