• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 465
  • 88
  • 87
  • 56
  • 43
  • 20
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 977
  • 316
  • 202
  • 182
  • 167
  • 165
  • 153
  • 137
  • 123
  • 104
  • 96
  • 93
  • 92
  • 87
  • 81
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Analyzing General-Purpose Computing Performance on GPU

Meng, Fanfu 01 December 2015 (has links) (PDF)
ABSTRACT Analyzing General-Purpose Computing Performance on GPU Graphic Processing Unit (GPU) has become one of the most important components in modern computer systems. GPUs have evolved from a single -purpose graphic rendering hardware to a powerful processor that is capable of handling many different kinds of computing tasks. However, GPUs don’t perform well on every application, and it takes a lot of design effort to get good performance on a GPU. This thesis aims to investigate the relative performance of a GPU vs. CPU. Design effort is held minimum for both CPU implementations and GPU implementations. Matrix multiplication, Advance Encryption Standard (AES) and 32-bit Cyclic Redundancy Check (CRC32) are implemented on both a CPU and GPU. Input data size is varied to test the performance of the CPU and the GPU. The GPU generally has better performance than the CPU for matrix multiplication and AES because of the applications' good instruction and data parallelism. CRC has very poor parallelism, so the CPU performs better. For very small data inputs, the CPU generally outperformed the GPU because of GPU memory transfer overhead.
52

Apply Modern Image Recognition Techniques with CUDA Implementation on Autonomous Systems

Liu, Yicong January 2017 (has links)
Computer vision has been developed rapidly in the last few decades and it has been used in a variety of fields such as robotics, autonomous vehicles, traffic surveillance camera etc. nowadays. However, when we process these high-resolution raw materials from the cameras, it brings a heavy burden to the processors. Because of the physical architecture of the CPU, the pixels of the input image should be processed sequentially. So even if the computation capability of modern CPUs is increasing, it is still unable to make a decent performance in repeating one single work millions of times. The objective of this thesis is to give an alternative solution to speed up the execution time of processing images through integrating popular image recognition algorithms (SURF and FREAK) on GPUs with the help of CUDA platform developed by NVIDIA, to speed up the recognition time. The experiments were made to compare the performances between traditional CPU-only program and CUDA program, and the result show the algorithms running on CUDA platform have a significant speedup. / Thesis / Master of Applied Science (MASc)
53

GPU Based Methods for Interactive Information Visualization of Big Data

Mi, Peng 19 January 2016 (has links)
Interactive visual analysis has been a key component of gaining insights in information visualization area. However, the amount of data has increased exponentially in the past few years. Existing information visualization techniques lack scalability to deal with big data, such as graphs with millions of nodes, or millions of multidimensional data records. Recently, the remarkable development of Graphics Processing Unit (GPU) makes GPU useful for general-purpose computation. Researchers have proposed GPU based solutions for visualizing big data in graphics and scientific visualization areas. However, GPU based big data solutions in information visualization area are not well investigated. In this thesis, I concentrate on the visualization of big data in information visualization area. More specifically, I focus on visual exploration of large graphs and multidimensional datasets based on the GPU technology. My work demonstrates that GPU based methods are useful for sensemaking of big data in information visualization area. / Master of Science
54

Efficient fMRI Analysis and Clustering on GPUs

Talasu, Dharneesh 16 December 2011 (has links)
No description available.
55

Photon tracing na GPU / Photon Tracing on GPU

Galacz, Roman January 2013 (has links)
Subject of this thesis is acceleration of the photon mapping method on a graphic card. The photon mapping is a method for computing almost realistic global illumination of the scene. The computation itself is relatively time-consuming, so the acceleration of it is a hot issue in the field of computer graphics. The photon mapping is described in detail from photon tracing to rendering of the scene. The thesis is then focused on spatial subdivision structures, especially to the uniform grid. The design and the implementation of the application computing the photon mapping on GPU, which is achieved by OpenGL and CUDA interoperability, is described in the next part of the thesis. Lastly, the application is tested properly. The achieved results are reviewed in the conclusion of the thesis.
56

[en] ROAD NETWORK GENERATION ON THE GPU / [pt] GERAÇÃO DE MALHAS RODOVIÁRIAS NA GPU

PEDRO BOECHAT DE ALMEIDA GERMANO 10 February 2015 (has links)
[pt] O primeiro estágio na linha de produção de um sistema de geração procedural de cidades é, tipicamente, a geração da malha rodoviária. Este trabalho apresenta um algoritmo para a geração de malhas rodoviárias em paralelo na GPU usando um modelo de execução baseado em filas de trabalho. Esse algoritmo recebe parâmetros declarativos, juntamente com mapas geográficos e sócio estatísticos, e produz uma representação em alto nível de uma malha rodoviária urbana. / [en] The first stage in the pipeline of a procedural city generation system is typically the generation of the road network. This work presents a parallel algorithm for road networks generation on the GPU, using a work-queue based execution model. This algorithm receives declarative parameters along with geographic and socio-statistical maps and produces a high level representation of an urban road network.
57

A Performance Comparison of VMware GPU Virtualization Techniques in Cloud Gaming

2016 March 1900 (has links)
Cloud gaming is an application deployment scenario which runs an interactive gaming application remotely in a cloud according to the commands received from a thin client and streams the scenes as a video sequence back to the client over the Internet, and it is of interest to both research community and industry. The academic community has developed some open-source cloud gaming systems such as GamingAnywhere for research study, while some industrial pioneers such as Onlive and Gaikai have succeeded in gaining a large user base in the cloud gaming market. Graphical Processing Unit (GPU) virtualization plays an important role in such an environment as it is a critical component that allows virtual machines to run 3D applications with performance guarantees. Currently, GPU pass-through and GPU sharing are the two main techniques of GPU virtualization. The former enables a single virtual machine to access a physical GPU directly and exclusively, while the latter makes a physical GPU shareable by multiple virtual machines. VMware Inc., one of the most popular virtualization solution vendors, has provided concrete implementations of GPU pass-through and GPU sharing. In particular, it provides a GPU pass-through solution called Virtual Dedicated Graphics Acceleration (vDGA) and a GPU-sharing solution called Virtual Shared Graphics Acceleration (vSGA). Moreover, VMware Inc. recently claimed it realized another GPU sharing solution called vGPU. Nevertheless, the feasibility and performance of these solutions in cloud gaming has not been studied yet. In this work, an experimental study is conducted to evaluate the feasibility and performance of GPU pass-through and GPU sharing solutions offered by VMware in cloud gaming scenarios. The primary results confirm that vDGA and vGPU techniques can fit the demands of cloud gaming. In particular, these two solutions achieved good performance in the tested graphics card benchmarks, and gained acceptable image quality and response delay for the tested games.
58

Algorithms for MARS spectral CT.

Knight, David Warwick January 2015 (has links)
This thesis reports on algorithmic design and software development completed for the Medipix All Resolution System (MARS) multi-energy CT scanner. Two areas of research are presented - the speed and usability improvements made to the post-reconstruction material decomposition software; and the development of two algorithms designed for the implementation of a novel voxel system into the MARS image reconstruction chain. The MARS MD software package is the primary material analysis tool used by members of the MARS group. The photon-processing ability of the MARS scanner is what makes material decomposition possible. MARS MD loads reconstructed images created after a scan and creates a new set of images, one for every individual material within the object. The software is capable of discriminating at least six different materials, plus air, within the object. A significant speed improvement to this program was attained by moving the code base from GNU Octave to MATLAB and applying well known optimisation routines, while the creation of a graphical user interface made the software more accessible and easy to use. The changes made to MARS MD represented a significant contribution to the productivity of the entire MARS group. A drawback of the MARS image reconstruction chain is the time required to generate images of a scanned object. Compared to commercially available CT systems, the MARS system takes several orders of magnitude longer to do essentially the same job. With up to eight energy bins worth of data to consider during reconstruction, compared to a single energy bin in most com- mercial scanners, it is not surprising that there is a shortfall. A major performance limitation of the reconstruction process lies in the calculation of the small distances travelled by every detected photon within individual portions of the reconstruction volume. This thesis investigates a novel volume geometry that was developed by Prof. Phil Butler and Dr. Peter Renaud, and is designed to partially mitigate this time constraint. By treating the volume as a cylinder instead of a traditional cubic structure, the number of individual path length calculations can be drastically reduced. Two sets of algorithms are prototyped, coded in MATLAB, C++ and CUDA, and finally compared in terms of speed and visual accuracy.
59

Compiling Data Dependent Control Flow on SIMD GPUs

Popa, Tiberiu January 2004 (has links)
Current Graphic Processing Units (GPUs) (circa. 2003/2004) have programmable vertex and fragment units. Often these units are implemented as SIMD processors employing parallel pipelines. Data dependent conditional execution on SIMD architectures implemented using processor idling is inefficient. I propose a multi-pass approach based on conditional streams which allows dynamic load balancing of the fragment units of the GPU and better theoretical performance on programs using data dependent conditionals and loops. The proposed system can be used to turn the fragment unit of a SIMD GPU into a stream processor with data dependent control flow.
60

Autotuning wavefront patterns for heterogeneous architectures

Mohanty, Siddharth January 2015 (has links)
Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious parallel boilerplate code, and allowing a focus on only application specific issues. However, the constrained algorithmic model associated with each pattern also enables the creation of pattern-specific optimization strategies. These can capture more complex variations than would be accessible by analysis of equivalent unstructured source code. These variations create complex optimization spaces. Machine learning offers well established techniques for exploring such spaces. In this thesis we use machine learning to create autotuning strategies for heterogeneous parallel implementations of applications which follow the wavefront pattern. In a wavefront, computation starts from one corner of the problem grid and proceeds diagonally like a wave to the opposite corner in either two or three dimensions. Our framework partitions and optimizes the work created by these applications across systems comprising multicore CPUs and multiple GPU accelerators. The tuning opportunities for a wavefront include controlling the amount of computation to be offloaded onto GPU accelerators, choosing the number of CPU and GPU threads to process tasks, tiling for both CPU and GPU memory structures, and trading redundant halo computation against communication for multiple GPUs. Our exhaustive search of the problem space shows that these parameters are very sensitive to the combination of architecture, wavefront instance and problem size. We design and investigate a family of autotuning strategies, targeting single and multiple CPU + GPU systems, and both two and three dimensional wavefront instances. These yield an average of 87% of the performance found by offline exhaustive search, with up to 99% in some cases.

Page generated in 0.0217 seconds