Global ETD Search

181	Akcelerace ultrazvukové neurostimulace pomocí vysokoúrovňových GPGPU knihoven / Acceleration of Ultrasound Neurostimulation Using High-Level GPGPU Libraries Mička, Richard January 2021 (has links) This thesis explores potential use of GPGPU libraries to accelerate k-Wave toolkit's acoustic wave propagation simulation. Firstly, the thesis researches and assesses available high level GPGPU libraries. Afterwards, an insight into k-Wave toolkit's current state of simulation acceleration is provided. Based on that, an approach to enhance currently available code for processors into a heterogeneous application, that is capable of being run on graphics card, is proposed. The outcome of this thesis is an application that can utilize graphics card. If graphics card is unavailable, a fallback into thread and SIMD based acceleration for processor is executed. The product of this thesis is then evaluated based on its performance, maintenance difficulty and usability.
182	Paralelizace výpočtů pro zpracování obrazu / Paralelized image processing library Fuksa, Tomáš January 2011 (has links) This work deals with parallel computing on modern processors - multi-core CPU and GPU. The goal is to learn about computing on this devices suitable for parallelization, define their advantages and disadvantages, test their properties in examples and select appropriate tools to implement a library for parallel image processing. This library is going to be used for the vanishing point estimation in the path finding mobile robot.
183	Akcelerace neuronových sítí s využitím GPU / The GPU Based Acceleration of Neural Networks Šimíček, Ondřej January 2015 (has links) The thesis deals with the acceleration of backpropagation neural networks using graphics chips. To solve this problem it was used the OpenCL technology that allows work with graphics chips from different manufacturers. The main goal was to accelerate the time-consuming learning process and classification process. The acceleration was achieved by training a large amount of neural networks simultaneously. The speed gain was used to find the best settings and topology of neural network for a given task using genetic algorithm.
184	Akcelerace částicových rojů PSO pomocí GPU / Acceleration of Particle Swarm Optimization Using GPUs Krézek, Vladimír January 2012 (has links) This work deals with the PSO technique (Particle Swarm Optimization), which is capable to solve complex problems. This technique can be used for solving complex combinatorial problems (the traveling salesman problem, the tasks of knapsack), design of integrated circuits and antennas, in fields such as biomedicine, robotics, artificial intelligence or finance. Although the PSO algorithm is very efficient, the time required to seek out appropriate solutions for real problems often makes the task intractable. The goal of this work is to accelerate the execution time of this algorithm by the usage of Graphics processors (GPU), which offers higher computing potential while preserving the favorable price and size. The boolean satisfiability problem (SAT) was chosen to verify and benchmark the implementation. As the SAT problem belongs to the class of the NP-complete problems, any reduction of the solution time may broaden the class of tractable problems and bring us new interesting knowledge.
185	Akcelerace částicových rojů PSO pomocí GPU / Particle Swarm Optimization on GPUs Záň, Drahoslav January 2013 (has links) This thesis deals with a population based stochastic optimization technique PSO (Particle Swarm Optimization) and its acceleration. This simple, but very effective technique is designed for solving difficult multidimensional problems in a wide range of applications. The aim of this work is to develop a parallel implementation of this algorithm with an emphasis on acceleration of finding a solution. For this purpose, a graphics card (GPU) providing massive performance was chosen. To evaluate the benefits of the proposed implementation, a CPU and GPU implementation were created for solving a problem derived from the known NP-hard Knapsack problem. The GPU application shows 5 times average and almost 10 times the maximum speedup of computation compared to an optimized CPU application, which it is based on.
186	Detekce obličejů ve videu na GPU / Face Detection in Video on GPU Tesař, Martin January 2012 (has links) This work deals with task of face detection on graphic card. First part is the introduction to face detection methods focusing on detector proposed by Viola and Jones. Further, this work studies the possibilities of mapping detector's key parts on graphic card. Next part describes implementation details of designed application. The end of work include results and comparison with CPU approach. The last chapter summarizes the whole work and proposes future possibilities of development.
187	Fyzikální simulace na GPU / Physics Simulation on GPU Janošík, Ondřej January 2016 (has links) This thesis addresses the issue of rigid body simulation and possibilities of paralellization using GPU. It describes the basics necessary for implementation of basic physics engine for blocks and technologies which can be used for acceleration. In my thesis, I describe approach which allowed me to gradually accellerate physics simulation using OpenCL. Each significant change is described in its own section and includes measurement results with short summary.
188	Konstrukce kD stromu na GPU / Building kD Tree on GPU Bajza, Jakub January 2016 (has links) This term project addresses the construction of kD tree acceleration structures and parallelization of this construction using GPU. At the beginning, there is an introduction of the reader into CUDA platform for parallel programming. There is a decription of generic principles as well as specific features that will be used in this thesis. Following that the reader is put into the issue of acceleration structures for Ray tracing. These structures are described and the kD tree acceleration structure and its variants are portrayed in detail. After that the analysis of chosen kD tree variant is broken down and the problems and issuse of its parallel implementation are adressed. As a part of implementation discription, there is a short descripton of CPU variant and detailed specifications of the CUDA kernels. The testing section brings the results of implementation in form of CPU vs GPU comparison, as well as evaluation of how much the metric set in design was fulfilled. In the end there is a summary of achieved goals and results followed by possible future improvements for the implementation.
189	Synthesizing Software from a ForSyDe Model Targeting GPGPUs Hjort Blindell, Gabriel January 2012 (has links) Today, a plethora of parallel execution platforms are available. One platform in particular is the GPGPU – a massively parallel architecture designed for exploiting data parallelism. However, GPGPUS are notoriously difficult to program due to the way data is accessed and processed, and many interconnected factors affect the performance. This makes it an exceptionally challengingtask to write correct and high-performing applications for GPGPUS. This thesis project aims to address this problem by investigating how ForSyDe models – a software engineering methodology where applications are modeled at a very high level of abstraction – can be synthesized into CUDA C code for execution on NVIDIA CUDA-enabled graphics cards. The report proposes a software synthesis process which discovers one type of potential data parallelism in a model and generates either pure C or CUDA C code. A prototype of the software synthesis component has also been implemented and tested on models derived from two applications – a Mandelbrot generator and an industrial-scale image processor. The synthesized CUDA code produced in the tests was shown to be both correct and efficient, provided there was enough computation complexity in the processes to amortize the overhead cost of using the GPGPU. ForSyDe abstract program models software synthesis gpgpu cuda C Engineering and Technology Teknik och teknologier
190	Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems Ashcraft, Matthew B. 07 July 2020 (has links) First, we present techniques to efficiently schedule data transfers through compiler analyses. Compared to transferring data immediately before and after the kernel executes, our scheduling results in orders of magnitude improvements in execution time, number of data transfers, and number of bytes transferred. Second, we demonstrate techniques to provide on-chip debugging for heterogeneous systems through recording execution on the software in addition to debugging circuitry in the hardware, and provide a temporal correlation between the hardware and software traces through synchronization. This allows us to follow debug data between the hardware and software trace buffers. Due to the added cost of synchronizing the trace buffers, we explore synchronization schemes which can reduce the impact synchronization depending on the code structure. We demonstrate the quantitative impact of these techniques on execution time and hardware and software resources, which are under a 2x increase to execution time in most cases. Third, we demonstrate how source-code debugging techniques for on-chip debugging can be applied to OpenCL FPGA kernels in heterogeneous systems. We developed techniques and a tool-flow that allows users to select variables to record, automatically insert recording instructions into the kernel source code, synthesize the changes directly into the hardware design using commercial HLS tools, retrieve the trace data through kernel arguments, and present it to the user. Overall, quantitative measurements showed our techniques resulted in modest increases to execution time and hardware resources. compilers accelerators GPGPU data transfers HLS high-level Synthesis FPGA Engineering

Search results