• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 22
  • 3
  • Tagged with
  • 28
  • 17
  • 15
  • 11
  • 10
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU Systems

Špeťko, Matej January 2018 (has links)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.
22

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Mittenzwey, Nico 31 March 2009 (has links)
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.
23

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Potluri, Sreeram 18 September 2014 (has links)
No description available.
24

Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware

Jose, Jithin 30 December 2014 (has links)
No description available.
25

High Performance Network I/O in Virtual Machines over Modern Interconnects

Huang, Wei 12 September 2008 (has links)
No description available.
26

Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-­scale Cluster-Systems with special Attention on InfiniBand Networks

Hoefler, Torsten 28 June 2005 (has links) (PDF)
The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBand networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBand networks, to enhance the barrier operation by utilizing standard InfiniBand operations as much as possible, and to design a constant time barrier for InfiniBand with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBand. All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBand as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBand with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5 microseconds. All components have their specialty and can be used to enhance the barrier performance significantly.
27

Software-defined Buffer Management and Robust Congestion Control for Modern Datacenter Networks

Danushka N Menikkumbura (12208121) 20 April 2022 (has links)
<p>  Modern datacenter network applications continue to demand ultra low latencies and very high throughputs. At the same time, network infrastructure keeps achieving higher speeds and larger bandwidths. We still need better network management solutions to keep these two demand and supply fronts go hand-in-hand. There are key metrics that define network performance such as flow completion time (the lower the better), throughput (the higher the better), and end-to-end latency (the lower the better) that are mainly governed by how effectively network application get their fair share of network resources. We observe that buffer utilization on network switches gives a very accurate indication of network performance. Therefore, network buffer management is important in modern datacenter networks, and other network management solutions can be efficiently built around buffer utilization. This dissertation presents three solutions based on buffer use on network switches.</p> <p>  This dissertation consists of three main sections. The first section is on a specification language for buffer management in modern programmable switches. The second section is on a congestion control solution for Remote Direct Memory Access (RDMA) networks. The third section is on a solution to head-of-the-line blocking in modern datacenter networks.</p>
28

Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-­scale Cluster-Systems with special Attention on InfiniBand Networks

Hoefler, Torsten 01 April 2005 (has links)
The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBand networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBand networks, to enhance the barrier operation by utilizing standard InfiniBand operations as much as possible, and to design a constant time barrier for InfiniBand with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBand. All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBand as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBand with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5 microseconds. All components have their specialty and can be used to enhance the barrier performance significantly.

Page generated in 0.0204 seconds