Spelling suggestions: "subject:"collective coperations"" "subject:"collective cooperations""
1 |
Communication/Computation Overlap in MPIHoefler, Torsten 04 January 2006 (has links) (PDF)
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given.
|
2 |
Communication/Computation Overlap in MPIHoefler, Torsten 04 January 2006 (has links)
This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given.
|
3 |
Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand NetworksHoefler, Torsten 28 June 2005 (has links) (PDF)
The MPI_Barrier-collective operation, as a part of the MPI-1.1
standard, is extremely important for all parallel applications using it.
The latency of this operation increases the application run time and
can not be overlaid. Thus, the whole MPI performance can be decreased
by unsatisfactory barrier latency. The main goals of this work are to
lower the barrier latency for InfiniBand networks by analyzing well
known barrier algorithms with regards to their suitability within
InfiniBand networks, to enhance the barrier operation by utilizing
standard InfiniBand operations as much as possible, and to design a
constant time barrier for InfiniBand with special hardware support.
This partition into three main steps is retained throughout the whole
thesis. The first part evaluates publicly known models and proposes a
new more accurate model (LoP) for InfiniBand. All barrier algorithms are
evaluated within the well known LogP and this new model. Two new
algorithms which promise a better performance have been developed. A
constant time barrier integrated into InfiniBand as well as a cheap
separate barrier network is proposed in the hardware section. All
results have been implemented inside the Open MPI framework. This work
led to three new Open MPI collective modules. The first one implements
different barrier algorithms which are dynamically benchmarked and
selected during the startup phase to maximize the performance. The
second one offers a special barrier implementation for InfiniBand with RDMA
and performs up to 40% better than the best solution that has been
published so far. The third implementation offers a constant time
barrier in a separate network, leveraging commodity components, with a
latency of only 2.5 microseconds. All components have their specialty and can
be used to enhance the barrier performance significantly.
|
4 |
Efficient Broadcast for Multicast-Capable Interconnection NetworksSiebert, Christian 20 November 2006 (has links) (PDF)
The broadcast function MPI_Bcast() from the
MPI-1.1 standard is one of the most heavily
used collective operations for the message
passing programming paradigm.
This diploma thesis makes use of a feature called
"Multicast", which is supported by several
network technologies (like Ethernet or
InfiniBand), to create an efficient MPI_Bcast()
implementation, especially for large communicators
and small-sized messages.
A preceding analysis of existing real-world
applications leads to an algorithm which does not
only perform well for synthetical benchmarks
but also even better for a wide class of
parallel applications. The finally derived
broadcast has been implemented for the
open source MPI library "Open MPI" using
IP multicast.
The achieved results prove that
the new broadcast is usually always better
than existing point-to-point implementations,
as soon as the number of MPI processes exceeds the
8 node boundary. The performance gain reaches
a factor of 4.9 on 342 nodes, because the
new algorithm scales practically independently
of the number of involved processes. / Die Broadcastfunktion MPI_Bcast() aus dem MPI-1.1
Standard ist eine der meistgenutzten kollektiven
Kommunikationsoperationen des nachrichtenbasierten
Programmierparadigmas.
Diese Diplomarbeit nutzt die Multicastfähigkeit,
die von mehreren Netzwerktechnologien (wie Ethernet
oder InfiniBand) bereitgestellt wird, um eine
effiziente MPI_Bcast() Implementation zu erschaffen,
insbesondere für große Kommunikatoren und kleinere
Nachrichtengrößen.
Eine vorhergehende Analyse von existierenden
parallelen Anwendungen führte dazu, dass der neue
Algorithmus nicht nur bei synthetischen Benchmarks
gut abschneidet, sondern sein Potential bei echten
Anwendungen noch besser entfalten kann. Der
letztendlich daraus entstandene Broadcast wurde
für die Open-Source MPI Bibliothek "Open MPI"
entwickelt und basiert auf IP Multicast.
Die erreichten Ergebnisse belegen, dass der neue
Broadcast üblicherweise immer besser als jegliche
Punkt-zu-Punkt Implementierungen ist, sobald die
Anzahl von MPI Prozessen die Grenze von 8 Knoten
überschreitet. Der Geschwindigkeitszuwachs
erreicht einen Faktor von 4,9 bei 342 Knoten,
da der neue Algorithmus praktisch unabhängig
von der Knotenzahl skaliert.
|
5 |
Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand NetworksHoefler, Torsten 01 April 2005 (has links)
The MPI_Barrier-collective operation, as a part of the MPI-1.1
standard, is extremely important for all parallel applications using it.
The latency of this operation increases the application run time and
can not be overlaid. Thus, the whole MPI performance can be decreased
by unsatisfactory barrier latency. The main goals of this work are to
lower the barrier latency for InfiniBand networks by analyzing well
known barrier algorithms with regards to their suitability within
InfiniBand networks, to enhance the barrier operation by utilizing
standard InfiniBand operations as much as possible, and to design a
constant time barrier for InfiniBand with special hardware support.
This partition into three main steps is retained throughout the whole
thesis. The first part evaluates publicly known models and proposes a
new more accurate model (LoP) for InfiniBand. All barrier algorithms are
evaluated within the well known LogP and this new model. Two new
algorithms which promise a better performance have been developed. A
constant time barrier integrated into InfiniBand as well as a cheap
separate barrier network is proposed in the hardware section. All
results have been implemented inside the Open MPI framework. This work
led to three new Open MPI collective modules. The first one implements
different barrier algorithms which are dynamically benchmarked and
selected during the startup phase to maximize the performance. The
second one offers a special barrier implementation for InfiniBand with RDMA
and performs up to 40% better than the best solution that has been
published so far. The third implementation offers a constant time
barrier in a separate network, leveraging commodity components, with a
latency of only 2.5 microseconds. All components have their specialty and can
be used to enhance the barrier performance significantly.
|
6 |
Efficient Broadcast for Multicast-Capable Interconnection NetworksSiebert, Christian 30 September 2006 (has links)
The broadcast function MPI_Bcast() from the
MPI-1.1 standard is one of the most heavily
used collective operations for the message
passing programming paradigm.
This diploma thesis makes use of a feature called
"Multicast", which is supported by several
network technologies (like Ethernet or
InfiniBand), to create an efficient MPI_Bcast()
implementation, especially for large communicators
and small-sized messages.
A preceding analysis of existing real-world
applications leads to an algorithm which does not
only perform well for synthetical benchmarks
but also even better for a wide class of
parallel applications. The finally derived
broadcast has been implemented for the
open source MPI library "Open MPI" using
IP multicast.
The achieved results prove that
the new broadcast is usually always better
than existing point-to-point implementations,
as soon as the number of MPI processes exceeds the
8 node boundary. The performance gain reaches
a factor of 4.9 on 342 nodes, because the
new algorithm scales practically independently
of the number of involved processes. / Die Broadcastfunktion MPI_Bcast() aus dem MPI-1.1
Standard ist eine der meistgenutzten kollektiven
Kommunikationsoperationen des nachrichtenbasierten
Programmierparadigmas.
Diese Diplomarbeit nutzt die Multicastfähigkeit,
die von mehreren Netzwerktechnologien (wie Ethernet
oder InfiniBand) bereitgestellt wird, um eine
effiziente MPI_Bcast() Implementation zu erschaffen,
insbesondere für große Kommunikatoren und kleinere
Nachrichtengrößen.
Eine vorhergehende Analyse von existierenden
parallelen Anwendungen führte dazu, dass der neue
Algorithmus nicht nur bei synthetischen Benchmarks
gut abschneidet, sondern sein Potential bei echten
Anwendungen noch besser entfalten kann. Der
letztendlich daraus entstandene Broadcast wurde
für die Open-Source MPI Bibliothek "Open MPI"
entwickelt und basiert auf IP Multicast.
Die erreichten Ergebnisse belegen, dass der neue
Broadcast üblicherweise immer besser als jegliche
Punkt-zu-Punkt Implementierungen ist, sobald die
Anzahl von MPI Prozessen die Grenze von 8 Knoten
überschreitet. Der Geschwindigkeitszuwachs
erreicht einen Faktor von 4,9 bei 342 Knoten,
da der neue Algorithmus praktisch unabhängig
von der Knotenzahl skaliert.
|
Page generated in 0.1322 seconds