Global ETD Search

1	Fast Barrier Synchronization for InfiniBand Hoefler, Torsten 04 January 2006 (has links) (PDF) Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency. Barrier InfiniBand MPI_Barrier Open MPI ddc:004 MPI <Schnittstelle> Parallelrechner
2	Analysis and Optimization of the Packet Scheduler in Open MPI Lichei, Andre 13 November 2006 (has links) (PDF) We compared well known measurement methods for LogGP parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. The applicability of our method is shown for the low level communication API of Open MPI across several interconnection networks. Based on the LogGP model, we developed a low overhead packet scheduling algorithm. It can handle different types of interconnects with different characteristics. It is able to produce schedules which are very close to the optimum for both small and large messages. The efficiency of the algorithm for small messages is show for a Open MPI implementation. The implementation uses the LogGP benchmark to obtain the LogGP parameters of the available interconnects and can so adapt to any given system. LogGP Modular Component Architecture Open MPI ddc:004 Hochleistungsrechnen Informatik Parallelrechner Scheduling
3	Entwicklung einer optimierten kollektiven Komponente Mosch, Marek 24 September 2007 (has links) (PDF) Diese Diplomarbeit beschäftigt sich mit der Entwicklung einer kollektiven Komponente für die MPI-2 Implementation Open MPI. Die Komponente soll optimierte Algorithmen für das Myrinet Netzwerk auf Basis des Low-Level Kommunikations-protokolls GM beinhalten. MPI_ALLTOALL MPI_BARRIER MPI_BCAST MPI_GATHER MPI_SCATTER Myrinet Open MPI ddc:004 Hochleistungsrechnen Netzwerk
4	Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes Franke, Maik 24 September 2007 (has links) (PDF) Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. InfiniBand Kollektive Operationen LogfP Modell MPI_Allreduce MPI_Reduce MPI_Reduce_scatter MPI_Scan Open MPI ddc:004 Cluster Hochleistungsrechnen
5	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 30 June 2009 (has links) (PDF) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM ddc:004 Hochleistungsrechnen Parallelrechner
6	Low Overhead Ethernet Communication for Open MPI on Linux Clusters Hoefler, Torsten, Reinhardt, Mirko, Mietke, Frank, Mehlan, Torsten, Rehm, Wolfgang 20 July 2006 (has links) (PDF) This paper describes the basic concepts of our solution to improve the performance of Ethernet Communication on a Linux Cluster environment by introducing Reliable Low Latency Ethernet Sockets. We show that about 25% of the socket latency can be saved by using our simplified protocol. Especially, we put emphasis on demonstrating that this performance benefit is able to speed up the MPI level communication. Therefore we have developed a new BTL component for Open MPI, an open source MPI-2 implementation which offers with its Modular Component Architecture a nearly ideal environment to implement our changes. Microbenchmarks of MPI collective and Point-to-Point operations were performed. We see a performance improvement of 8% to 16% for LU and SP implementations of the NAS parallel benchmark suite which spends a significant amount of time in the MPI. Practical application tests with Abinit, an electronic structure calculation program, show that the runtime of be nearly halved on a 4 node system. Thus we show evidence that our new Ethernet communication protocol is able to increase the speedup of parallel applications considerably. Low Overhead Communication Open MPI Optimized MPI ddc:004 Ethernet LINUX MPI
7	Improving the Performance of Selected MPI Collective Communication Operations on InfiniBand Networks Viertel, Carsten 23 September 2007 (has links) (PDF) The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Open MPI's component architecture offers an easy way to implement new algorithms for collective operations, but current implementations use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. The first part of this work gives a short overview of the InfiniBand Architecture and Open MPI. In the next part several models for parallel computation are analyzed. Afterwards various algorithms for the MPI_Scatter, MPI_Gather and MPI_Allgather operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared. The test results show, that the performance of the operations could be improved for several message and communicator size ranges. InfiniBand Kollektive Operationen LogP Modell MPI_Allgather MPI_Gather MPI_Scatter Open MPI ddc:004 Hochleistungsrechnen MPI <Schnittstelle> Netzwerk
8	Fast Barrier Synchronization for InfiniBand Hoefler, Torsten 04 January 2006 (has links) Barrier Synchronization is crucial for many parallel systems. This talk introduces different synchronization mechanisms and demonstrates new approaches to leverage special hardware properties of InfiniBand to lower the Barrier latency. info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Parallelrechner Barrier InfiniBand MPI_Barrier Open MPI
9	Low Overhead Ethernet Communication for Open MPI on Linux Clusters Hoefler, Torsten, Reinhardt, Mirko, Mietke, Frank, Mehlan, Torsten, Rehm, Wolfgang 20 July 2006 (has links) This paper describes the basic concepts of our solution to improve the performance of Ethernet Communication on a Linux Cluster environment by introducing Reliable Low Latency Ethernet Sockets. We show that about 25% of the socket latency can be saved by using our simplified protocol. Especially, we put emphasis on demonstrating that this performance benefit is able to speed up the MPI level communication. Therefore we have developed a new BTL component for Open MPI, an open source MPI-2 implementation which offers with its Modular Component Architecture a nearly ideal environment to implement our changes. Microbenchmarks of MPI collective and Point-to-Point operations were performed. We see a performance improvement of 8% to 16% for LU and SP implementations of the NAS parallel benchmark suite which spends a significant amount of time in the MPI. Practical application tests with Abinit, an electronic structure calculation program, show that the runtime of be nearly halved on a 4 node system. Thus we show evidence that our new Ethernet communication protocol is able to increase the speedup of parallel applications considerably. info:eu-repo/classification/ddc/004 ddc:004 Ethernet LINUX MPI Low Overhead Communication Open MPI Optimized MPI
10	Analysis and Optimization of the Packet Scheduler in Open MPI Lichei, Andre 02 November 2006 (has links) We compared well known measurement methods for LogGP parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. The applicability of our method is shown for the low level communication API of Open MPI across several interconnection networks. Based on the LogGP model, we developed a low overhead packet scheduling algorithm. It can handle different types of interconnects with different characteristics. It is able to produce schedules which are very close to the optimum for both small and large messages. The efficiency of the algorithm for small messages is show for a Open MPI implementation. The implementation uses the LogGP benchmark to obtain the LogGP parameters of the available interconnects and can so adapt to any given system. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Informatik Parallelrechner Scheduling LogGP Modular Component Architecture Open MPI

Search results