Global ETD Search

271	Optimizing Lempel-Ziv Factorization for the GPU Architecture Ching, Bryan 01 June 2014 (has links) (PDF) Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel nature of GPUs for computations other that their original purpose of rendering graphics. Our work targets the use of GPUs for general lossless data compression. Specifically, we developed and ported an algorithm that constructs the Lempel-Ziv factorization directly on the GPU. Our implementation bypasses the sequential nature of the LZ factorization and attempts to compute the factorization in parallel. By breaking down the LZ factorization into what we call the PLZ, we are able to outperform the fastest serial CPU implementations by up to 24x and perform comparatively to a parallel multicore CPU implementation. To achieve these speeds, our implementation outputted LZ factorizations that were on average only 0.01 percent greater than the optimal solution that what could be computed sequentially. We are also able to reevaluate the fastest GPU suffix array construction algorithm, which is needed to compute the LZ factorization. We are able to find speedups of up to 5x over the fastest CPU implementations. GPU CUDA Parallelism Compression LZ77 Lempel-Ziv Computer and Systems Architecture
272	Physics Engine on the GPU with OpenGL Compute Shaders Bui, Quan Huy Minh 01 March 2021 (has links) (PDF) Any kind of graphics simulation can be thought of like a fancy flipbook. This notion is, of course, nothing new. For instance, in a game, the central computing unit (CPU) needs to process frame by frame, figuring out what is happening, and then finally issues draw calls to the graphics processing unit (GPU) to render the frame and display it onto the monitor. Traditionally, the CPU has to process a lot of things: from the creation of the window environment for the processed frames to be displayed, handling game logic, processing artificial intelligence (AI) for non-player characters (NPC), to the physics, and issuing draw calls; and all of these have to be done within roughly 0.0167 seconds to maintain real-time performance of 60 frames per second (fps). The main goal of this thesis is to move the physics pipeline of any kind of simulation to the GPU instead of the CPU. The main tool to make this possible would be the usage of OpenGL Compute Shaders. OpenGL is a high-performance graphics application programming interface (API), used as an abstraction layer for the CPU to communicate with the GPU. OpenGL was created by the Khronos Group primarily for graphics, or drawing frames only. In the later versions of OpenGL, the Khronos Group has introduced Compute Shader, which can be used for general-purpose computing on the GPU (GPGPU). This means the GPU can be used to process any arbitrary math computations, and not limited to only process the vertices and fragments of polygons. This thesis features Broad Phase and Narrow Phase collision detection stages, and a collision Resolution Phase with Sequential Impulses entirely on the GPU with real-time performance. Physics Simulation Engine 3D OpenGL GPU Graphics and Human Computer Interfaces
273	Millipyde: A Cross-Platform Python Framework for Transparent GPU Acceleration Asbury, James B 01 December 2021 (has links) (PDF) The prevalence of general-purpose GPU computing continues to grow and tackle a wider variety of problems that benefit from GPU-acceleration. This acceleration often suffers from a high barrier to entry, however, due to the complexity of software tools that closely map to the underlying GPU hardware, the fast-changing landscape of GPU environments, and the fragmentation of tools and languages that only support specific platforms. Because of this, new solutions will continue to be needed to make GPGPU acceleration more accessible to the developers that can benefit from it. AMD’s new cross-platform development ecosystem ROCm provides promise for developing applications and solutions that work across systems running both AMD and non-AMD GPU computing hardware. This thesis presents Millipyde, a framework for GPU acceleration in Python using AMD’s ROCm. Millipyde includes two new types, the gpuarray and gpuimage, as well as three new constructs for building GPU-accelerated applications – the Operation, Pipeline, and Generator. Using these tools, Millipyde hopes to make it easier for engineers and researchers to write GPU-accelerated code in Python. Millipyde also has the potential to schedule work across many GPUs in complex multi-device environments. These capabilities will be demonstrated in a sample application of augmenting images on-device for machine learning applications. Our results showed that Millipyde is capable of making individual image-related transformations up to around 200 times faster than their CPU-only equivalents. Constructs such as the Millipyde’s Pipeline was also able to additionally improve performance in certain situations, and it performed best when it was allowed to transparently schedule work across multiple devices. Parallel Programming GPGPU GPU Systems Programming Systems Architecture
274	Parallelising High OrderTransform of Point SpreadFunction and TemplateSubtraction for AstronomicImage Subtraction : The implementation of BACH Wång, Annie, Lells, Victor January 2023 (has links) This thesis explores possible improvements, using parallel computing, to the PSF-alignment and image subtraction algorithm found in HOTPANTS. In time-domain astronomy the PSF-alignment and image subtraction algorithm OIS is used to identify transient events. hotpants is a software package based on OIS, the software package ISIS, and other subsequent research done to improve OIS. A parallel GPU implementation of the algorithm from HOTPANTS – henceforth known as BACH –was created for this thesis. The goal of this thesis is to answer the questions: “what parts of HOTPANTS are most suited for parallelisation?” and “how does bach perform compared to HOTPANTS and SFFT?”, another PSF-alignment and image subtraction tool. The authors found that the parts most susceptible to parallelisation were the convolution and subtraction steps. However, the subtraction did not display a significant improvement to its sequential counterpart. The other parts of HOTPANTS were deemed too complex to implement in parallel on the GPU. However, some parts could probably either be partly parallelised on the GPU or parallelised usingthe CPU. BACH was always as fast as or faster than HOTPANTS; it was generally 2 times faster, but was up to 4.5 times faster in some test cases. It was also faster than SFFT, but this result was not equivalent to the result presented in [15], which is why the authors of this thesis believe something was wrong with either the installation of SFFT or the hardware used to test it. Image Subtraction HOTPANTS SFFT GPU Parallelisation Computer Engineering Datorteknik
275	Autonomous Path-Following by Approximate Inverse Dynamics and Vector Field Prediction Gerlach, Adam R. 23 October 2014 (has links) No description available. Aerospace Materials UAV Navigation Control Prediction GPU Vector Field
276	A Fast Poisson Solver with Periodic Boundary Conditions for GPU Clusters in Various Configurations Rattermann, Dale N. 27 October 2014 (has links) No description available. Aerospace Materials GPU Poisson Incompressible CFD FFT CUDA
277	Sparse Matrix-Vector Multiplication on GPU Ashari, Arash January 2014 (has links) No description available. Computer Engineering Computer Science GPU CUDA Sparse SpMV BRC ACSR
278	Architectural Solutions For Mitigating Voltage Noise in GPUs Thomas, Renji George George January 2015 (has links) No description available. Computer Engineering Computer Science
279	Graphic-Processing-Units Based Adaptive Parameter Estimation of a Visual Psychophysical Model Gu, Hairong 17 December 2012 (has links) No description available. Psychology Psychophysics Adaptive Design Optimization GPU computing parameter estimation
280	Application of machine learning potential to predict grain boundary properties and development of its performant implementation / 機械学習原子間ポテンシャルの結晶粒界構造探索への応用と高速化手法開発 Nishiyama, Takayuki 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(工学) / 甲第23899号 / 工博第4986号 / 新制\|\|工\|\|1778(附属図書館) / 京都大学大学院工学研究科材料工学専攻 / (主査)教授田中功, 教授中村裕之, 教授奥田浩司 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM machine learning interatomic potential grain boundary GPU 500

Search results