Global ETD Search

11	Optimalizace platformy pro distribuované výpočty Hadoop / Optimization of the Hadoop Platform for Distributed Computation Čecho, Jaroslav January 2012 (has links) This thesis is focusing on possibilities of improving the Apache Hadoop framework by outsourcing some computation to a graphic card using the NVIDIA CUDA technology. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model called mapreduce. NVIDIA CUDA is a platform which allows one to use a graphic card for a general computation. This thesis contains description and experimental implementations of suitable computation inside te Hadoop framework that can benefit from being executed on a graphic card.
12	Streaming Ray Tracer na GPU / Streaming Ray Tracer on GPU Dvořák, Jakub January 2008 (has links) Current consumer GPUs can be used as high performance stream processors and are a tempting platform to be used to implement raytracing. In this paper I briefly present raytracing principles and methods used to accelerate it, modern GPUs programmable pipeline and examples of its use. I describe stream processing in general and available interfaces enabling the usage of GPU as stream processor. Then I present my GPU raytracer implementation, used algorithms and experiments I have made.
13	Apply Modern Image Recognition Techniques with CUDA Implementation on Autonomous Systems Liu, Yicong January 2017 (has links) Computer vision has been developed rapidly in the last few decades and it has been used in a variety of fields such as robotics, autonomous vehicles, traffic surveillance camera etc. nowadays. However, when we process these high-resolution raw materials from the cameras, it brings a heavy burden to the processors. Because of the physical architecture of the CPU, the pixels of the input image should be processed sequentially. So even if the computation capability of modern CPUs is increasing, it is still unable to make a decent performance in repeating one single work millions of times. The objective of this thesis is to give an alternative solution to speed up the execution time of processing images through integrating popular image recognition algorithms (SURF and FREAK) on GPUs with the help of CUDA platform developed by NVIDIA, to speed up the recognition time. The experiments were made to compare the performances between traditional CPU-only program and CUDA program, and the result show the algorithms running on CUDA platform have a significant speedup. / Thesis / Master of Applied Science (MASc) CUDA SURF FREAK Autonomous GPU
14	Analyzing General-Purpose Computing Performance on GPU Meng, Fanfu 01 December 2015 (has links) (PDF) ABSTRACT Analyzing General-Purpose Computing Performance on GPU Graphic Processing Unit (GPU) has become one of the most important components in modern computer systems. GPUs have evolved from a single -purpose graphic rendering hardware to a powerful processor that is capable of handling many different kinds of computing tasks. However, GPUs don’t perform well on every application, and it takes a lot of design effort to get good performance on a GPU. This thesis aims to investigate the relative performance of a GPU vs. CPU. Design effort is held minimum for both CPU implementations and GPU implementations. Matrix multiplication, Advance Encryption Standard (AES) and 32-bit Cyclic Redundancy Check (CRC32) are implemented on both a CPU and GPU. Input data size is varied to test the performance of the CPU and the GPU. The GPU generally has better performance than the CPU for matrix multiplication and AES because of the applications' good instruction and data parallelism. CRC has very poor parallelism, so the CPU performs better. For very small data inputs, the CPU generally outperformed the GPU because of GPU memory transfer overhead. GPU CUDA Electrical and Computer Engineering
15	Towards Algorithm Transformation for Temporal Data Mining on GPU Ponce, Sean Philip 18 August 2009 (has links) Data Mining allows one to analyze large amounts of data. With increasing amounts of data being collected, more computing power is needed to mine these larger and larger sums of data. The GPU is an excellent piece of hardware with a compelling price to performance ratio and has rapidly risen in popularity. However, this increase in speed comes at a cost. The GPU's architecture executes non-data parallel code with either marginal speedup or even slowdown. The type of data mining we examine, temporal data mining, uses a Â¯nite state machine (FSM), which is non-data parallel. We contribute the concept of algorithm transformation for increasing the data parallelism of an algorithm. We apply the algorithm transformation process to the problem of temporal data mining which solves the same problem as the FSM-based algorithm, but is data parallel. The new GPU implementation shows a 6x speedup over the best CPU implementation and 11x speedup over a previous GPU implementation. / Master of Science CUDA GPGPU temporal data mining
16	Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU Systems Špeťko, Matej January 2018 (has links) After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.
17	Algorithms for MARS spectral CT. Knight, David Warwick January 2015 (has links) This thesis reports on algorithmic design and software development completed for the Medipix All Resolution System (MARS) multi-energy CT scanner. Two areas of research are presented - the speed and usability improvements made to the post-reconstruction material decomposition software; and the development of two algorithms designed for the implementation of a novel voxel system into the MARS image reconstruction chain. The MARS MD software package is the primary material analysis tool used by members of the MARS group. The photon-processing ability of the MARS scanner is what makes material decomposition possible. MARS MD loads reconstructed images created after a scan and creates a new set of images, one for every individual material within the object. The software is capable of discriminating at least six different materials, plus air, within the object. A significant speed improvement to this program was attained by moving the code base from GNU Octave to MATLAB and applying well known optimisation routines, while the creation of a graphical user interface made the software more accessible and easy to use. The changes made to MARS MD represented a significant contribution to the productivity of the entire MARS group. A drawback of the MARS image reconstruction chain is the time required to generate images of a scanned object. Compared to commercially available CT systems, the MARS system takes several orders of magnitude longer to do essentially the same job. With up to eight energy bins worth of data to consider during reconstruction, compared to a single energy bin in most com- mercial scanners, it is not surprising that there is a shortfall. A major performance limitation of the reconstruction process lies in the calculation of the small distances travelled by every detected photon within individual portions of the reconstruction volume. This thesis investigates a novel volume geometry that was developed by Prof. Phil Butler and Dr. Peter Renaud, and is designed to partially mitigate this time constraint. By treating the volume as a cylinder instead of a traditional cubic structure, the number of individual path length calculations can be drastically reduced. Two sets of algorithms are prototyped, coded in MATLAB, C++ and CUDA, and finally compared in terms of speed and visual accuracy. MARS CT GPU CUDA spectral MATLAB MD
18	CUDA-Accelerated ORB-SLAM for UAVs Bourque, Donald 01 June 2017 (has links) "The use of cameras and computer vision algorithms to provide state estimation for robotic systems has become increasingly popular, particularly for small mobile robots and unmanned aerial vehicles (UAVs). These algorithms extract information from the camera images and perform simultaneous localization and mapping (SLAM) to provide state estimation for path planning, obstacle avoidance, or 3D reconstruction of the environment. High resolution cameras have become inexpensive and are a lightweight and smaller alternative to laser scanners. UAVs often have monocular camera or stereo camera setups since payload and size impose the greatest restrictions on their flight time and maneuverability. This thesis explores ORB-SLAM, a popular Visual SLAM method that is appropriate for UAVs. Visual SLAM is computationally expensive and normally offloaded to computers in research environments. However, large UAVs with greater payload capacity may carry the necessary hardware for performing the algorithms. The inclusion of general-purpose GPUs on many of the newer single board computers allows for the potential of GPU-accelerated computation within a small board profile. For this reason, an NVidia Jetson board containing an NVidia Pascal GPU was used. CUDA, NVidia’s parallel computing platform, was used to accelerate monocular ORB-SLAM, achieving onboard Visual SLAM on a small UAV. Committee members:" ORB-SLAM CUDA visual SLAM SLAM NVidia
19	Accelerating SRD Simulation on GPU Chen, Zhilu 17 April 2013 (has links) Stochastic Rotation Dynamics (SRD) is a particle-based simulation method that can be used to model complex fluids either in two or three dimensions, which is very useful in biology and physics study. Although SRD is computationally efficient compared to other simulations, it still takes a long time to run the simulation when the size of the model is large, e.g. when using a large array of particles to simulate dense polymers. In some cases, the simulation could take months before getting the results. Thus, this research focuses on the acceleration of the SRD simulation by using GPU. GPU acceleration can reduce the simulation time by orders of magnitude. It is also cost-effective because a GPU costs significantly less than a computer cluster. Compute Unified Device Architecture (CUDA) programming makes it possible to parallelize the program to run on hundreds or thousands of thread processors on GPU. The program is divided into many concurrent threads. In addition, several kernel functions are used for data synchronization. The speedup of GPU acceleration is varied for different parameters of the simulation program, such as size of the model, density of the particles, formation of polymers, and above all the complexity of the algorithm itself. Compared to the CPU version, it is about 10 times speedup for the particle simulation and up to 50 times speedup for polymers. Further performance improvement can be achieved by using multiple GPUs and code optimization. SRD simulation GPU acceleration CUDA programming
20	GPU Based Real-Time Trinocular Stereovision Yao, Yuanbin 24 August 2012 (has links) "Stereovision has been applied in many fields including UGV (Unmanned Ground Vehicle) navigation and surgical robotics. Traditionally most stereovision applications are binocular which uses information from a horizontal 2-camera array to perform stereo matching and compute the depth image. Trinocular stereovision with a 3-camera array has been proved to provide higher accuracy in stereo matching which could benefit application like distance finding, object recognition and detection. However, as a result of an extra camera, additional information to be processed would increase computational burden and hence not practical in many time critical applications like robotic navigation and surgical robot. Due to the nature of GPUÂ’s highly parallelized SIMD (Single Instruction Multiple Data) architecture, GPGPU (General Purpose GPU) computing can effectively be used to parallelize the large data processing and greatly accelerate the computation of algorithms used in trinocular stereovision. So the combination of trinocular stereovision and GPGPU would be an innovative and effective method for the development of stereovision application. This work focuses on designing and implementing a real-time trinocular stereovision algorithm with GPU (Graphics Processing Unit). The goal involves the use of Open Source Computer Vision Library (OpenCV) in C++ and NVidia CUDA GPGPU Solution. Algorithms were developed with many different basic image processing methods and a winner-take-all method is applied to perform fusion of disparities in different directions. The results are compared in accuracy and speed to verify the improvement." OpenCV Computer Vision Trinocular Stereovision GPGPU CUDA

Search results