271 |
Optimizing Lempel-Ziv Factorization for the GPU ArchitectureChing, Bryan 01 June 2014 (has links) (PDF)
Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel nature of GPUs for computations other that their original purpose of rendering graphics. Our work targets the use of GPUs for general lossless data compression. Specifically, we developed and ported an algorithm that constructs the Lempel-Ziv factorization directly on the GPU. Our implementation bypasses the sequential nature of the LZ factorization and attempts to compute the factorization in parallel. By breaking down the LZ factorization into what we call the PLZ, we are able to outperform the fastest serial CPU implementations by up to 24x and perform comparatively to a parallel multicore CPU implementation. To achieve these speeds, our implementation outputted LZ factorizations that were on average only 0.01 percent greater than the optimal solution that what could be computed sequentially.
We are also able to reevaluate the fastest GPU suffix array construction algorithm, which is needed to compute the LZ factorization. We are able to find speedups of up to 5x over the fastest CPU implementations.
|
272 |
Physics Engine on the GPU with OpenGL Compute ShadersBui, Quan Huy Minh 01 March 2021 (has links) (PDF)
Any kind of graphics simulation can be thought of like a fancy flipbook. This notion is, of course, nothing new. For instance, in a game, the central computing unit (CPU) needs to process frame by frame, figuring out what is happening, and then finally issues draw calls to the graphics processing unit (GPU) to render the frame and display it onto the monitor. Traditionally, the CPU has to process a lot of things: from the creation of the window environment for the processed frames to be displayed, handling game logic, processing artificial intelligence (AI) for non-player characters (NPC), to the physics, and issuing draw calls; and all of these have to be done within roughly 0.0167 seconds to maintain real-time performance of 60 frames per second (fps). The main goal of this thesis is to move the physics pipeline of any kind of simulation to the GPU instead of the CPU. The main tool to make this possible would be the usage of OpenGL Compute Shaders. OpenGL is a high-performance graphics application programming interface (API), used as an abstraction layer for the CPU to communicate with the GPU. OpenGL was created by the Khronos Group primarily for graphics, or drawing frames only. In the later versions of OpenGL, the Khronos Group has introduced Compute Shader, which can be used for general-purpose computing on the GPU (GPGPU). This means the GPU can be used to process any arbitrary math computations, and not limited to only process the vertices and fragments of polygons. This thesis features Broad Phase and Narrow Phase collision detection stages, and a collision Resolution Phase with Sequential Impulses entirely on the GPU with real-time performance.
|
273 |
Millipyde: A Cross-Platform Python Framework for Transparent GPU AccelerationAsbury, James B 01 December 2021 (has links) (PDF)
The prevalence of general-purpose GPU computing continues to grow and tackle a wider variety of problems that benefit from GPU-acceleration. This acceleration often suffers from a high barrier to entry, however, due to the complexity of software tools that closely map to the underlying GPU hardware, the fast-changing landscape of GPU environments, and the fragmentation of tools and languages that only support specific platforms. Because of this, new solutions will continue to be needed to make GPGPU acceleration more accessible to the developers that can benefit from it. AMD’s new cross-platform development ecosystem ROCm provides promise for developing applications and solutions that work across systems running both AMD and non-AMD GPU computing hardware.
This thesis presents Millipyde, a framework for GPU acceleration in Python using AMD’s ROCm. Millipyde includes two new types, the gpuarray and gpuimage, as well as three new constructs for building GPU-accelerated applications – the Operation, Pipeline, and Generator. Using these tools, Millipyde hopes to make it easier for engineers and researchers to write GPU-accelerated code in Python. Millipyde also has the potential to schedule work across many GPUs in complex multi-device environments. These capabilities will be demonstrated in a sample application of augmenting images on-device for machine learning applications. Our results showed that Millipyde is capable of making individual image-related transformations up to around 200 times faster than their CPU-only equivalents. Constructs such as the Millipyde’s Pipeline was also able to additionally improve performance in certain situations, and it performed best when it was allowed to transparently schedule work across multiple devices.
|
274 |
Parallelising High OrderTransform of Point SpreadFunction and TemplateSubtraction for AstronomicImage Subtraction : The implementation of BACHWång, Annie, Lells, Victor January 2023 (has links)
This thesis explores possible improvements, using parallel computing, to the PSF-alignment and image subtraction algorithm found in HOTPANTS. In time-domain astronomy the PSF-alignment and image subtraction algorithm OIS is used to identify transient events. hotpants is a software package based on OIS, the software package ISIS, and other subsequent research done to improve OIS. A parallel GPU implementation of the algorithm from HOTPANTS – henceforth known as BACH –was created for this thesis. The goal of this thesis is to answer the questions: “what parts of HOTPANTS are most suited for parallelisation?” and “how does bach perform compared to HOTPANTS and SFFT?”, another PSF-alignment and image subtraction tool. The authors found that the parts most susceptible to parallelisation were the convolution and subtraction steps. However, the subtraction did not display a significant improvement to its sequential counterpart. The other parts of HOTPANTS were deemed too complex to implement in parallel on the GPU. However, some parts could probably either be partly parallelised on the GPU or parallelised usingthe CPU. BACH was always as fast as or faster than HOTPANTS; it was generally 2 times faster, but was up to 4.5 times faster in some test cases. It was also faster than SFFT, but this result was not equivalent to the result presented in [15], which is why the authors of this thesis believe something was wrong with either the installation of SFFT or the hardware used to test it.
|
275 |
Autonomous Path-Following by Approximate Inverse Dynamics and Vector Field PredictionGerlach, Adam R. 23 October 2014 (has links)
No description available.
|
276 |
A Fast Poisson Solver with Periodic Boundary Conditions for GPU Clusters in Various ConfigurationsRattermann, Dale N. 27 October 2014 (has links)
No description available.
|
277 |
Sparse Matrix-Vector Multiplication on GPUAshari, Arash January 2014 (has links)
No description available.
|
278 |
Architectural Solutions For Mitigating Voltage Noise in GPUsThomas, Renji George George January 2015 (has links)
No description available.
|
279 |
Graphic-Processing-Units Based Adaptive Parameter Estimation of a Visual Psychophysical ModelGu, Hairong 17 December 2012 (has links)
No description available.
|
280 |
Application of machine learning potential to predict grain boundary properties and development of its performant implementation / 機械学習原子間ポテンシャルの結晶粒界構造探索への応用と高速化手法開発Nishiyama, Takayuki 23 March 2022 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第23899号 / 工博第4986号 / 新制||工||1778(附属図書館) / 京都大学大学院工学研究科材料工学専攻 / (主査)教授 田中 功, 教授 中村 裕之, 教授 奥田 浩司 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM
|
Page generated in 0.0164 seconds