Global ETD Search

1	Fast Barrier Synchronization for InfiniBand Hoefler, Torsten 04 January 2006 (has links) (PDF) Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency. Barrier InfiniBand MPI_Barrier Open MPI ddc:004 MPI <Schnittstelle> Parallelrechner
2	Communication/Computation Overlap in MPI Hoefler, Torsten 04 January 2006 (has links) (PDF) This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given. MPI_BARRIER Non blocking collective operations kollektive Operationen ddc:004 MPI <Schnittstelle> Parallelrechner
3	Entwicklung einer optimierten kollektiven Komponente Mosch, Marek 24 September 2007 (has links) (PDF) Diese Diplomarbeit beschäftigt sich mit der Entwicklung einer kollektiven Komponente für die MPI-2 Implementation Open MPI. Die Komponente soll optimierte Algorithmen für das Myrinet Netzwerk auf Basis des Low-Level Kommunikations-protokolls GM beinhalten. MPI_ALLTOALL MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER Myrinet Open MPI ddc:004 Hochleistungsrechnen Netzwerk
4	A Survey of Barrier Algorithms for Coarse Grained Supercomputers Hoefler, Torsten, Mehlan, Torsten, Mietke, Frank, Rehm, Wolfgang 28 June 2005 (has links) (PDF) There are several different algorithms available to perform a synchronization of multiple processors. Some of them support only shared memory architectures or very fine grained supercomputers. This work gives an overview about all currently known algorithms which are suitable for distributed shared memory architectures and message passing based computer systems (loosely coupled or coarse grained supercomputers). No absolute decision can be made for choosing a barrier algorithm for a machine. Several architectural aspects have to be taken into account. The overview about known barrier algorithms given in this work is mostly targeted to implementors of libraries supporting collective communication (such as MPI). Barrier Collective Communication Kollektive Operationen MPI_Barrier ddc:004 MPI <Schnittstelle> Mpi-Sprache Netzwerk <Graphentheorie> Supercomputer
5	Fast Barrier Synchronization for InfiniBand Hoefler, Torsten 04 January 2006 (has links) Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency. info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Parallelrechner Barrier InfiniBand MPI_Barrier Open MPI
6	Communication/Computation Overlap in MPI Hoefler, Torsten 04 January 2006 (has links) This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given. info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Parallelrechner MPI_BARRIER Non blocking collective operations kollektive Operationen
7	Entwicklung einer optimierten kollektiven Komponente Mosch, Marek 31 July 2007 (has links) Diese Diplomarbeit beschäftigt sich mit der Entwicklung einer kollektiven Komponente für die MPI-2 Implementation Open MPI. Die Komponente soll optimierte Algorithmen für das Myrinet Netzwerk auf Basis des Low-Level Kommunikations-protokolls GM beinhalten. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Netzwerk MPI_ALLTOALL MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER Myrinet Open MPI
8	Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks Hoefler, Torsten 28 June 2005 (has links) (PDF) The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBand networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBand networks, to enhance the barrier operation by utilizing standard InfiniBand operations as much as possible, and to design a constant time barrier for InfiniBand with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBand. All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBand as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBand with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5 microseconds. All components have their specialty and can be used to enhance the barrier performance significantly. Barrier Collective Operations Collectives InfiniBand Kollektive Operationen LoP Modell LogGP LogGPC LogP MPI_Barrier Open MPI RDMA ddc:004 Cluster Server MPI <Schnittstelle> Netzwerk <Graphentheorie>
9	A Survey of Barrier Algorithms for Coarse Grained Supercomputers Hoefler, Torsten, Mehlan, Torsten, Mietke, Frank, Rehm, Wolfgang 28 June 2005 (has links) There are several different algorithms available to perform a synchronization of multiple processors. Some of them support only shared memory architectures or very fine grained supercomputers. This work gives an overview about all currently known algorithms which are suitable for distributed shared memory architectures and message passing based computer systems (loosely coupled or coarse grained supercomputers). No absolute decision can be made for choosing a barrier algorithm for a machine. Several architectural aspects have to be taken into account. The overview about known barrier algorithms given in this work is mostly targeted to implementors of libraries supporting collective communication (such as MPI). info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Mpi-Sprache Netzwerk <Graphentheorie> Supercomputer Barrier Collective Communication Kollektive Operationen MPI_Barrier
10	Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks Hoefler, Torsten 01 April 2005 (has links) The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBand networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBand networks, to enhance the barrier operation by utilizing standard InfiniBand operations as much as possible, and to design a constant time barrier for InfiniBand with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBand. All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBand as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBand with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5 microseconds. All components have their specialty and can be used to enhance the barrier performance significantly. info:eu-repo/classification/ddc/004 ddc:004 Cluster Server MPI <Schnittstelle> Netzwerk <Graphentheorie> Barrier Collective Operations Collectives InfiniBand Kollektive Operationen LoP Modell LogGP LogGPC LogP MPI_Barrier Open MPI RDMA

Search results