Global ETD Search

11	Correspondence-based pairwise depth estimation with parallel acceleration Bartosch, Nadine January 2018 (has links) This report covers the implementation and evaluation of a stereo vision corre- spondence-based depth estimation algorithm on a GPU. The results and feed- back are used for a Multi-view camera system in combination with Jetson TK1 devices for parallelized image processing and the aim of this system is to esti- mate the depth of the scenery in front of it. The performance of the algorithm plays the key role. Alongside the implementation, the objective of this study is to investigate the advantages of parallel acceleration inter alia the differences to the execution on a CPU which are significant for all the function, the imposed overheads particular for a GPU application like memory transfer from the CPU to the GPU and vice versa as well as the challenges for real-time and concurrent execution. The study has been conducted with the aid of CUDA on three NVIDIA GPUs with different characteristics and with the aid of knowledge gained through extensive literature study about different depth estimation algo- rithms but also stereo vision and correspondence as well as CUDA in general. Using the full set of components of the algorithm and expecting (near) real-time execution is utopic in this setup and implementation, the slowing factors are in- ter alia the semi-global matching. Investigating alternatives shows that results for disparity maps of a certain accuracy are also achieved by local methods like the Hamming Distance alone and by a filter that refines the results. Further- more, it is demonstrated that the kernel launch configuration and the usage of GPU memory types like shared memory is crucial for GPU implementations and has an impact on the performance of the algorithm. Just concurrency proves to be a more complicated task, especially in the desired way of realization. For the future work and refinement of the algorithm it is therefore recommended to invest more time into further optimization possibilities in regards of shared memory and into integrating the algorithm into the actual pipeline. Depth estimation disparity stereo vision stereo correspondence NVIDIA GPU CUDA parallelization Computer Systems Datorsystem
12	Performance Metrics Analysis of GamingAnywhere with GPU accelerated NVIDIA CUDA Sreenibha Reddy, Byreddy January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere. Telecommunications Telekommunikation
13	Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA Sreenibha Reddy, Byreddy January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere. Telecommunications Telekommunikation
14	Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS Zaahid, Mohammed January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere Telecommunications Telekommunikation
15	Analysis of Hardware Usage Of Shuffle Instruction Based Performance Optimization in the Blinds-II Image Quality Assessment Algorithm January 2017 (has links) abstract: With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture. But for optimal performance it is necessary to make sure that all the GPU resources are efficiently used, and the latencies in the application are minimized. For this, it is essential to monitor the Hardware usage of the algorithm and thus diagnose the compute and memory bottlenecks in the implementation. In the following thesis, we will be analyzing the mapping of CUDA implementation of BLIINDS-II algorithm on the underlying GPU hardware, and come up with a Kepler architecture specific solution of using shuffle instruction via CUB library to tackle the two major bottlenecks in the algorithm. Experiments were conducted to convey the advantage of using shuffle instru3ction in algorithm over only using shared memory as a buffer to global memory. With the new implementation of BLIINDS-II algorithm using CUB library, a speedup of around 13.7% was achieved. / Dissertation/Thesis / Masters Thesis Engineering 2017 Engineering Computer engineering Computer science BLIINDS-II CUDA C++ GPGPU IQA KEPLER NVIDIA
16	GPF : a framework for general packet classification on GPU co-processors / GPU Packet Filter : framework for general packet classification on Graphics Processing Unit co-processors Nottingham, Alastair January 2012 (has links) This thesis explores the design and experimental implementation of GPF, a novel protocol-independent, multi-match packet classification framework. This framework is targeted and optimised for flexible, efficient execution on NVIDIA GPU platforms through the CUDA API, but should not be difficult to port to other platforms, such as OpenCL, in the future. GPF was conceived and developed in order to accelerate classification of large packet capture files, such as those collected by Network Telescopes. It uses a multiphase SIMD classification process which exploits both the parallelism of packet sets and the redundancy in filter programs, in order to classify packet captures against multiple filters at extremely high rates. The resultant framework - comprised of classification, compilation and buffering components - efficiently leverages GPU resources to classify arbitrary protocols, and return multiple filter results for each packet. The classification functions described were verified and evaluated by testing an experimental prototype implementation against several filter programs, of varying complexity, on devices from three GPU platform generations. In addition to the significant speedup achieved in processing results, analysis indicates that the prototype classification functions perform predictably, and scale linearly with respect to both packet count and filter complexity. Furthermore, classification throughput (packets/s) remained essentially constant regardless of the underlying packet data, and thus the effective data rate when classifying a particular filter was heavily influenced by the average size of packets in the processed capture. For example: in the trivial case of classifying all IPv4 packets ranging in size from 70 bytes to 1KB, the observed data rate achieved by the GPU classification kernels ranged from 60Gbps to 900Gbps on a GTX 275, and from 220Gbps to 3.3Tbps on a GTX 480. In the less trivial case of identifying all ARP, TCP, UDP and ICMP packets for both IPv4 and IPv6 protocols, the effective data rates ranged from 15Gbps to 220Gbps (GTX 275), and from 50Gbps to 740Gbps (GTX 480), for 70B and 1KB packets respectively. / LaTeX with hyperref package Graphics processing units Coprocessors Computer network protocols Computer networks -- Security measures NVIDIA Corporation
17	Paralelizace výpočtů pro zpracování obrazu / Paralelized image processing library Fuksa, Tomáš January 2011 (has links) This work deals with parallel computing on modern processors - multi-core CPU and GPU. The goal is to learn about computing on this devices suitable for parallelization, define their advantages and disadvantages, test their properties in examples and select appropriate tools to implement a library for parallel image processing. This library is going to be used for the vanishing point estimation in the path finding mobile robot.
18	Překladač jazyka C# do jazyka Nvidia CUDA / Programming CUDA with C# Zajíc, Jiří January 2012 (has links) This master's thesis is focused on GPU accelerated calculations on NVidia graphics card. CUDA technology is used and converted to implementation on a .NET platform. The problem is solved as a compiler from C# programing language to NVidia CUDA language with expression atrributes of C# language that preserves the same semantics of actions. Application is implemented in C# programing language and uses NRefactory, the open-source library.
19	Parallelizing Digital Signal Processing for GPU Ekstam Ljusegren, Hannes, Jonsson, Hannes January 2020 (has links) Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation. GPU Signal Processing Pulse detection CPU CUDA Jetson NVIDIA Parallelize Digital Processing Computer Engineering Datorteknik
20	Effective and Accelerated Informative Frame Filtering in Colonoscopy Videos Using Graphic Processing Units Karri, Venkata Praveen 08 1900 (has links) Colonoscopy is an endoscopic technique that allows a physician to inspect the mucosa of the human colon. Previous methods and software solutions to detect informative frames in a colonoscopy video (a process called informative frame filtering or IFF) have been hugely ineffective in (1) covering the proper definition of an informative frame in the broadest sense and (2) striking an optimal balance between accuracy and speed of classification in both real-time and non real-time medical procedures. In my thesis, I propose a more effective method and faster software solutions for IFF which is more effective due to the introduction of a heuristic algorithm (derived from experimental analysis of typical colon features) for classification. It contributed to a 5-10% boost in various performance metrics for IFF. The software modules are faster due to the incorporation of sophisticated parallel-processing oriented coding techniques on modern microprocessors. Two IFF modules were created, one for post-procedure and the other for real-time. Code optimizations through NVIDIA CUDA for GPU processing and/or CPU multi-threading concepts embedded in two significant microprocessor design philosophies (multi-core design and many-core design) resulted a 5-fold acceleration for the post-procedure module and a 40-fold acceleration for the real-time module. Some innovative software modules, which are still in testing phase, have been recently created to exploit the power of multiple GPUs together. Colonoscopy multithreading connectivity informative frame filtering CPU NVIDIA CUDA Colonoscopy. Video recording in medicine.

Search results