Spelling suggestions: "subject:"direct"" "subject:"cirect""
1 |
High-Performance Network Data Transfers to GPU : A Study of Nvidia GPU Direct RDMA and GPUNetIOGao, Yuchen January 2023 (has links)
This study investigates high-performance network data transfers, focusing on Nvidia Graphics Processing Unit (GPU) Direct Remote Direct Memory Access (RDMA) and GPUNetIO. These methods have emerged as promising strategies for improving data communication between GPUs and network interfaces, but harnessing their potential requires meticulous configuration and optimization. This research aims to clarify those architectures and achieve optimal performance in this context. The study begins with analyzing the source code for both architectures, explaining their underlying principles and what they have improved on the previous structures. A useroriented testing tool is also developed to provide users with a simplified interface for conducting tests and system configuration requirements. The research methodology consists of reviewing the literature and analyzing the source code of GPUDirect RDMA and GPUNetIO. Additionally, experiments are designed to evaluate various performance aspects, ranging from Central Processing Unit (CPU)- related factors to GPU metrics and network card performance. The results indicate a significant acceleration in data copying when based on GPUDirect RDMA technology. The introduction of GPUNetIO leads to a substantial decrease in CPU utilization. Furthermore, the user interface is designed for simple deployment on hosts and easy access by users. The interface is equipped with the recommended configuration settings. / Denna studie undersöker högpresterande nätverksdataöverföringar med fokus på Nvidia GPU Direct RDMA och GPUNetIO. GPU Direct RDMA har visat sig vara en lovande metod för att förbättra datakommunikationen mellan GPU:er och nätverksgränssnitt, men för att utnyttja dess potential krävs noggrann konfiguration och optimering. Denna forskning syftar till att klargöra komplexiteten och svårigheterna med att uppnå optimal prestanda i detta sammanhang. Studien inleds med en analys av källkoden för båda arkitekturerna, som förklarar deras underliggande principer och vad de har förbättrat jämfört med de tidigare strukturerna. Dessutom utvecklas ett användarorienterat testverktyg som syftar till att ge användarna ett förenklat gränssnitt för att utföra tester. Forskningsmetoden består av en genomgång av litteraturen och en analys av källkoden för GPUDirect RDMA och GPUNetIO. Dessutom har en uppsättning experiment utformats för att utvärdera olika prestandaaspekter, allt från CPU-relaterade faktorer till GPU-mätvärden och nätverkskortsprestanda. Resultaten indikerar en betydande acceleration av datakopieringen när den baseras på GPUDirect RDMA-teknik. Införandet av GPUNetIO leder till en betydande minskning av CPU-användningen. Dessutom är användargränssnittet utformat för enkel driftsättning på värdar och enkel åtkomst för användare. Gränssnittet är utrustat med rekommenderade konfigurationsinställningar.
|
2 |
Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU SystemsŠpeťko, Matej January 2018 (has links)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes the performance or the memory capacity of a single GPU is not enough for a scientific application. The application needs to be scaled into multiple GPUs. During the computation there is need for the GPUs to exchange partial results. This communication represents computation overhead. For this reason it is important to research the methods of the effective communication between GPUs. This means less CPU involvement, lower latency, shared system buffers. Inter-node and intra-node communication is examined. The main focus is on GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI.
|
3 |
CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD ApplicationsXue, Weicheng 25 January 2021 (has links)
Computational Fluid Dynamics (CFD) applications usually involve intensive computations, which can be accelerated through using open accelerators, especially GPUs due to their common use in the scientific computing community. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented numerically correctly, which is called code verification. This dissertation focuses on accelerating research CFD codes on multi-CPUs/GPUs using MPI and OpenACC, as well as the code verification for turbulence model implementation using the method of manufactured solutions and code-to-code comparisons. First, a variety of performance optimizations both agnostic and specific to applications and platforms are developed in order to 1) improve the heterogeneous CPU/GPU compute utilization; 2) improve the memory bandwidth to the main memory; 3) reduce communication overhead between the CPU host and the GPU accelerator; and 4) reduce the tedious manual tuning work for GPU scheduling. Both finite difference and finite volume CFD codes and multiple platforms with different architectures are utilized to evaluate the performance optimizations used. A maximum speedup of over 70 is achieved on 16 V100 GPUs over 16 Xeon E5-2680v4 CPUs for multi-block test cases. In addition, systematic studies of code verification are performed for a second-order accurate finite volume research CFD code. Cross-term sinusoidal manufactured solutions are applied to verify the Spalart-Allmaras and k-omega SST model implementation, both in 2D and 3D. This dissertation shows that the spatial and temporal schemes are implemented numerically correctly. / Doctor of Philosophy / Computational Fluid Dynamics (CFD) is a numerical method to solve fluid problems, which usually requires a large amount of computations. A large CFD problem can be decomposed into smaller sub-problems which are stored in discrete memory locations and accelerated by a large number of compute units. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented correctly, which is called code verification. This dissertation focuses on the CFD code acceleration as well as the code verification for turbulence model implementation. In this dissertation, multiple Graphic Processing Units (GPUs) are utilized to accelerate two CFD codes, considering that the GPU has high computational power and high memory bandwidth. A variety of optimizations are developed and applied to improve the performance of CFD codes on different parallel computing systems. The program execution time can be reduced significantly especially when multiple GPUs are used. In addition, code-to-code comparisons with some NASA CFD codes and the method of manufactured solutions are utilized to verify the correctness of a research CFD code.
|
4 |
Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU SystemsŠpeťko, Matej January 2018 (has links)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.
|
Page generated in 0.0318 seconds