71 |
An Asynchronous Event Communication Technique for Soft Real-Time GPGPU ApplicationsVestman, Alexander January 2015 (has links)
Context Interactive GPGPU applications requires low response time feedback from events such as user input in order to provide a positive user experience. Communication of these events must be performed asynchronously as to not cause significant performance penalties. Objectives In this study the usage of CPU/GPU shared virtual memory to perform asynchronous communication is explored. Previous studies have shown that shared virtual memory can increase computational performance compared to other types of memory. Methods A communication technique that aimed to utilize the performance increasing properties of shared virtual memory was developed and implemented. The implemented technique was then compared to an implementation using explicitly transferred memory in an experiment measuring the performance of the various stages involved in the technique. Results The results from the experiment revealed that utilizing shared virtual memory for performing asynchronous communication was in general slightly slower than- or comparable to using explicitly transferred memory. In some cases, where the memory access pattern was right, utilization of shared virtual memory lead to a 50% reduction in execution time compared to explicitly transferred memory. Conclusions A conclusion that shared virtual memory can be utilized for performing asynchronous communication was reached. It was also concluded that by utilizing shared virtual memory a performance increase can be achieved over explicitly transferred memory. In addition it was concluded that careful consideration of data size and access pattern is required to utilize the performance increasing properties of shared virtual memory.
|
72 |
Addressing software-managed cache development effort in GPGPUsLashgar, Ahmad 29 August 2017 (has links)
GPU Computing promises very high performance per watt for highly-parallelizable workloads.
Nowadays, there are various programming models developed to utilize the computational power of GPGPUs.
Low-level programming models provide full control over GPU resources and allow programmers to achieve peak performance of the chip.
In contrast, high-level programming models hide GPU-specific programming details and allow programmers to mainly express parallelism.
Later, the compiler parses the parallelization notes and translates them to low-level programming models.
This saves tremendous development effort and improves productivity, often achieved at the cost of sacrificing performance.
In this dissertation, we investigate the limitations of high-level programming models in achieving a performance near to low-level models.
Specifically, we study the performance and productivity gap between high-level OpenACC and low-level CUDA programming models and aim at reducing the performance gap, while maintaining the productivity advantages.
We start this study by developing our in-house OpenACC compiler.
Our compiler, called IPMACC, translates OpenACC for C to CUDA and uses the system compile to generate GPU binaries.
We develop various micro-benchmarks to understand GPU structure and implement a more efficient OpenACC compiler.
By using IPMACC, we evaluate the performance and productivity gap between a wide set of OpenACC and CUDA kernels.
From our findings, we conclude that one of the major reasons behind the big performance gap between OpenACC and CUDA is CUDA’s flexibility in exploiting the GPU software-managed cache.
Identifying this key benefit in low-level CUDA, we follow three effective paths in utilizing software-managed cache similar to CUDA, but at a lower development effort (e.g. using OpenACC instead).
In the first path, we explore the possibility of employing existing OpenACC directives in utilizing software-managed cache.
Specifically, the cache directive is devised in OpenACC API standard to allow the use of software-managed cache in GPUs.
We introduce an efficient implementation of OpenACC cache directive that performs very close to CUDA.
However, we show that the use of the cache directive is limited and the directive may not offer the full-functionality associated with the software-managed cache, as existing in CUDA.
In the second path, we build on our observation on the limitations of the cache directive and propose a new OpenACC directive, called the fcw directive, to address the shortcomings of the cache directive, while maintaining OpenACC productivity advantages.
We show that the fcw directive overcomes the cache directive limitations and narrows down the performance gap between CUDA and OpenACC significantly.
In the third path, we propose fully-automated hardware/software approach, called TELEPORT, for software-managed cache programming.
On the software side, TELEPORT statically analyzes CUDA kernels and identifies opportunities in utilizing the software-managed cache.
The required information is passed to the GPU via API calls.
Based on this information, on the hardware side, TELEPORT prefetches the data to the software-managed cache at runtime.
We show that TELEPORT can improve performance by 32% on average, while lowering the development effort by 2.5X, compared to hand-written CUDA equivalent. / Graduate
|
73 |
gcn.MOPS: accelerating cn.MOPS with GPUAlkhamis, Mohammad 16 June 2017 (has links)
cn.MOPS is a model-based algorithm used to quantitatively detect copy-number variations in next-generation, DNA-sequencing data. The algorithm is implemented as an R package and can speed up processing with multi-CPU parallelism. However, the maximum achievable speedup is limited by the overhead of multi-CPU parallelism, which increases with the number of CPU cores used. In this thesis, an alternative mechanism of process acceleration is proposed. Using one CPU core and a GPU device, the proposed solution, gcn.MOPS, achieved a speedup factor of 159× and decreased memory usage by more than half. This speedup was substantially higher than the maximum achievable speedup in cn.MOPS, which was ∼20×. / Graduate / 0984 / 0544 / 0715 / alkhamis@uvic.ca
|
74 |
General Purpose Programming on Modern Graphics HardwareFleming, Robert 05 1900 (has links)
I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
|
75 |
Porovnání metod pro rozklad křehkých těles na GPU pomocí 3D Voroného diagramu / Comparison of Brittle Body Decomposition GPU Based Methods Using Voronoi DiagramOnčo, Michael January 2020 (has links)
Following thesis regards itself with Voronoi diagram creation in 3D using a graphics card. It focuses on and compares certain algorithms that construct the diagram when given set of points in space. For this purpose there have been two algorithms implemented. First one creates Delaunay tetrahedralization using parallel splitting and flipping of present tetrahedra. Then it transforms it into a Voronoi diagram. The second algorithms utilizes planes to cut a mesh until required shapes are created. Testing shows the advantages and disadvantages of these algorithms and their relative performance. Main takeaway from this work for these algorithms is the relative sensitivity of the second method to the use of inappropriate shape in relation to given set of points. For the other algorithm its slower start and relative unsuitability for use with smaller sets of points is apparent, but it is greatly optimized for big sets.
|
76 |
Realizace vybraných výpočtů pomocí grafických karet / The realization of selected mathematical computations using graphical cards.Schreiber, Petr January 2010 (has links)
This work discusses available approaches for programming graphic hardware as a platform for executing parallel calculations. Text of the work is focused on new OpenCL technology, which allows executing the same high level code for taking control of full potential of multicore CPUs and GPUs, without explicit bindings to hardware vendor or operating system. Author provides the reader with libraries and tools based on OpenCL, along with practical examples and own observations about the current state of mentioned technology.
|
77 |
Detekce objektů na GPU / Object Detection on GPUJurák, Martin January 2015 (has links)
This thesis is focused on the acceleration of Random Forest object detection in an image. Random Forest detector is an ensemble of independently evaluated random decision trees. This feature can be used to acceleration on graphics unit. Development and increasing performance of graphics processing units allow the use of GPU for general-purpose computing (GPGPU). The goal of this thesis is describe how to implement Random Forest method on GPU with OpenCL standard.
|
78 |
Programová knihovna pro práci s umělými neuronovými sítěmi s akcelerací na GPU / Software Library for Artificial Neural Networks with Acceleration Using GPUTrnkóci, Andrej January 2013 (has links)
Artificial neural networks are demanding to computational power of a computer. Increasing their learning speed could mean new posibilities for research or aplication of the algorithm. And that is a purpose of this thesis. The usage of graphics processing units for neural networks learning is one way how to achieve above mentioned goals. This thesis is offering a survey of theoretical background and consequently implementation of a software library for neural networks learning with a Backpropagation algorithm with a support of acceleration on graphics processing unit.
|
79 |
Použití OpenCl v AVG na platformě Windows / Using of OpenCl at AVG in Windows PlatformBajcar, Martin January 2012 (has links)
The main topic of this thesis is the practical use of OpenCL at AVG company. AVG is looking for ways to decrease hardware requirement of their security product and also to decrease computation time of some algorithms. Using OpenCL is one way to achieve this requirement. Significant part of this thesis deals with optimization strategies for AMD and NVIDIA graphics cards as they are most common cards among users. Practical part of the thesis describes parallelization of two algorithms, their analysis and implementation. After that, the obtained results are presented and cases in which the use of OpenCL is beneficial are identified. As a part of implementation, library containing various utility functions which can aid programmers to implement OpenCL based code was developed.
|
80 |
Akcelerace mikroskopické simulace dopravy za použití OpenCL / Acceleration of Microscopic Urban Traffic Simulation Using OpenCLUrminský, Andrej January 2011 (has links)
As the number of vehicles on our roads increases, the problems related to this phenomenon emerge more dramatically. These problems include car accidents, congestions and CO2 emissions production, increasing CO2 concentrations in the atmosphere. In order to minimize these impacts and to use the road infrastructure eff ectively, the use of traffic simulators can come in handy. Thanks to these tools, it is possible to evaluate the evolution of a traffic flow with various initial states of the simulation and thus know what to do and how to react in different states of the real-world traffic situations. This thesis deals with acceleration of microscopic urban traffic simulation using OpenCL. Supposing it is necessary to simulate a large network traffic, the need to accelerate the simulation is necessary. For this purpose, it is possible, for example, to use the graphics processing units (GPUs) and the technique of GPGPU for general purpose computations, which is used in this work. The results show that the performance gains of GPUs are significant compared to a parallel implementation on CPU.
|
Page generated in 0.0329 seconds