Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa.de:swb:ch1-200701061 |
Date | 28 June 2007 |
Creators | Kühnemann, Matthias, Rauber, Thomas, Rünger, Gudula |
Contributors | TU Chemnitz, Fakultät für Informatik |
Publisher | Universitätsbibliothek Chemnitz |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:workingPaper |
Format | application/pdf, text/plain, application/zip |
Relation | dcterms:isPartOf:Chemnitzer Informatik-Berichte ; CSR-03-03 |
Page generated in 0.0025 seconds