Global ETD Search

1	Erweiterung eines existierenden Infiniband Benchmarks Viertel, Carsten 01 June 2006 (has links) (PDF) Infiniband wird zunehmend als Verbindungsnetzwerk für Cluster eingesetzt. Dadurch wird es nötig existierende Bibliotheken für parallele Programmiersprachen an das neue Netzwerk bestmöglich anzupassen. Ein wichtiger Bestandteil paralleler Programmiersprachen sind dabei kollektive Operationen, die es erfordern, eine Nachricht von einem Knoten zu vielen anderen oder auch von vielen Knoten an einen einzelnen zu senden. Um herauszufinden, welche Verbindungsarten und Operationen am besten für diese kollektiven Operationen geeignet sind, wurde ein Benchmark entwickelt. Ziel dieser Studienarbeit ist es, dieses Programm zu erweitern, auf einem Cluster zu testen und die Ergebnisse auszuwerten. InfiniBand Kollektive Operationen Multicast ddc:004 Benchmark Cluster MPI
2	Communication/Computation Overlap in MPI Hoefler, Torsten 04 January 2006 (has links) (PDF) This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given. MPI_BARRIER Non blocking collective operations kollektive Operationen ddc:004 MPI <Schnittstelle> Parallelrechner
3	Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes Franke, Maik 24 September 2007 (has links) (PDF) Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. InfiniBand Kollektive Operationen LogfP Modell MPI_Allreduce MPI_Reduce MPI_Reduce_scatter MPI_Scan Open MPI ddc:004 Cluster Hochleistungsrechnen
4	Erweiterung eines existierenden Infiniband Benchmarks Viertel, Carsten 01 June 2006 (has links) Infiniband wird zunehmend als Verbindungsnetzwerk für Cluster eingesetzt. Dadurch wird es nötig existierende Bibliotheken für parallele Programmiersprachen an das neue Netzwerk bestmöglich anzupassen. Ein wichtiger Bestandteil paralleler Programmiersprachen sind dabei kollektive Operationen, die es erfordern, eine Nachricht von einem Knoten zu vielen anderen oder auch von vielen Knoten an einen einzelnen zu senden. Um herauszufinden, welche Verbindungsarten und Operationen am besten für diese kollektiven Operationen geeignet sind, wurde ein Benchmark entwickelt. Ziel dieser Studienarbeit ist es, dieses Programm zu erweitern, auf einem Cluster zu testen und die Ergebnisse auszuwerten. info:eu-repo/classification/ddc/004 ddc:004 Benchmark Cluster MPI InfiniBand Kollektive Operationen Multicast
5	Improving the Performance of Selected MPI Collective Communication Operations on InfiniBand Networks Viertel, Carsten 23 September 2007 (has links) (PDF) The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Open MPI's component architecture offers an easy way to implement new algorithms for collective operations, but current implementations use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. The first part of this work gives a short overview of the InfiniBand Architecture and Open MPI. In the next part several models for parallel computation are analyzed. Afterwards various algorithms for the MPI_Scatter, MPI_Gather and MPI_Allgather operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. The test results show, that the performance of the operations could be improved for several message and communicator size ranges. InfiniBand Kollektive Operationen LogP Modell MPI_Allgather MPI_Gather MPI_Scatter Open MPI ddc:004 Hochleistungsrechnen MPI <Schnittstelle> Netzwerk
6	A Survey of Barrier Algorithms for Coarse Grained Supercomputers Hoefler, Torsten, Mehlan, Torsten, Mietke, Frank, Rehm, Wolfgang 28 June 2005 (has links) (PDF) There are several different algorithms available to perform a synchronization of multiple processors. Some of them support only shared memory architectures or very fine grained supercomputers. This work gives an overview about all currently known algorithms which are suitable for distributed shared memory architectures and message passing based computer systems (loosely coupled or coarse grained supercomputers). No absolute decision can be made for choosing a barrier algorithm for a machine. Several architectural aspects have to be taken into account. The overview about known barrier algorithms given in this work is mostly targeted to implementors of libraries supporting collective communication (such as MPI). Barrier Collective Communication Kollektive Operationen MPI_Barrier ddc:004 MPI <Schnittstelle> Mpi-Sprache Netzwerk <Graphentheorie> Supercomputer
7	Communication/Computation Overlap in MPI Hoefler, Torsten 04 January 2006 (has links) This talk discusses optimized collective algorithms and the benefits of leveraging independent hardware entities in a pipelined manner. The resulting approach uses overlap of computation and communication to reach this task. Different examples are given. info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Parallelrechner MPI_BARRIER Non blocking collective operations kollektive Operationen
8	Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes Franke, Maik 30 April 2007 (has links) Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. info:eu-repo/classification/ddc/004 ddc:004 Cluster Hochleistungsrechnen InfiniBand Kollektive Operationen LogfP Modell MPI_Allreduce MPI_Reduce MPI_Reduce_scatter MPI_Scan Open MPI
9	Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks Hoefler, Torsten 28 June 2005 (has links) (PDF) The MPI_Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBand networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBand networks, to enhance the barrier operation by utilizing standard InfiniBand operations as much as possible, and to design a constant time barrier for InfiniBand with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBand. All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBand as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBand with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5 microseconds. All components have their specialty and can be used to enhance the barrier performance significantly. Barrier Collective Operations Collectives InfiniBand Kollektive Operationen LoP Modell LogGP LogGPC LogP MPI_Barrier Open MPI RDMA ddc:004 Cluster Server MPI <Schnittstelle> Netzwerk <Graphentheorie>
10	Efficient Broadcast for Multicast-Capable Interconnection Networks Siebert, Christian 20 November 2006 (has links) (PDF) The broadcast function MPI_Bcast() from the MPI-1.1 standard is one of the most heavily used collective operations for the message passing programming paradigm. This diploma thesis makes use of a feature called "Multicast", which is supported by several network technologies (like Ethernet or InfiniBand), to create an efficient MPI_Bcast() implementation, especially for large communicators and small-sized messages. A preceding analysis of existing real-world applications leads to an algorithm which does not only perform well for synthetical benchmarks but also even better for a wide class of parallel applications. The finally derived broadcast has been implemented for the open source MPI library "Open MPI" using IP multicast. The achieved results prove that the new broadcast is usually always better than existing point-to-point implementations, as soon as the number of MPI processes exceeds the 8 node boundary. The performance gain reaches a factor of 4.9 on 342 nodes, because the new algorithm scales practically independently of the number of involved processes. / Die Broadcastfunktion MPI_Bcast() aus dem MPI-1.1 Standard ist eine der meistgenutzten kollektiven Kommunikationsoperationen des nachrichtenbasierten Programmierparadigmas. Diese Diplomarbeit nutzt die Multicastfähigkeit, die von mehreren Netzwerktechnologien (wie Ethernet oder InfiniBand) bereitgestellt wird, um eine effiziente MPI_Bcast() Implementation zu erschaffen, insbesondere für große Kommunikatoren und kleinere Nachrichtengrößen. Eine vorhergehende Analyse von existierenden parallelen Anwendungen führte dazu, dass der neue Algorithmus nicht nur bei synthetischen Benchmarks gut abschneidet, sondern sein Potential bei echten Anwendungen noch besser entfalten kann. Der letztendlich daraus entstandene Broadcast wurde für die Open-Source MPI Bibliothek "Open MPI" entwickelt und basiert auf IP Multicast. Die erreichten Ergebnisse belegen, dass der neue Broadcast üblicherweise immer besser als jegliche Punkt-zu-Punkt Implementierungen ist, sobald die Anzahl von MPI Prozessen die Grenze von 8 Knoten überschreitet. Der Geschwindigkeitszuwachs erreicht einen Faktor von 4,9 bei 342 Knoten, da der neue Algorithmus praktisch unabhängig von der Knotenzahl skaliert. Collective Operations Kollektive Operationen MPI_Bcast Open MPI ddc:004 Benchmark Broadcastingverfahren Cluster <Rechnernetz> Ethernet Hochleistungsrechnen MPI <Schnittstelle> Multicastingverfahren Open Source Wissenschaftliches Rechnen

Search results