Global ETD Search

21	Efficient Broadcast for Multicast-Capable Interconnection Networks Siebert, Christian 30 September 2006 (has links) The broadcast function MPI_Bcast() from the MPI-1.1 standard is one of the most heavily used collective operations for the message passing programming paradigm. This diploma thesis makes use of a feature called "Multicast", which is supported by several network technologies (like Ethernet or InfiniBand), to create an efficient MPI_Bcast() implementation, especially for large communicators and small-sized messages. A preceding analysis of existing real-world applications leads to an algorithm which does not only perform well for synthetical benchmarks but also even better for a wide class of parallel applications. The finally derived broadcast has been implemented for the open source MPI library "Open MPI" using IP multicast. The achieved results prove that the new broadcast is usually always better than existing point-to-point implementations, as soon as the number of MPI processes exceeds the 8 node boundary. The performance gain reaches a factor of 4.9 on 342 nodes, because the new algorithm scales practically independently of the number of involved processes. / Die Broadcastfunktion MPI_Bcast() aus dem MPI-1.1 Standard ist eine der meistgenutzten kollektiven Kommunikationsoperationen des nachrichtenbasierten Programmierparadigmas. Diese Diplomarbeit nutzt die Multicastfähigkeit, die von mehreren Netzwerktechnologien (wie Ethernet oder InfiniBand) bereitgestellt wird, um eine effiziente MPI_Bcast() Implementation zu erschaffen, insbesondere für große Kommunikatoren und kleinere Nachrichtengrößen. Eine vorhergehende Analyse von existierenden parallelen Anwendungen führte dazu, dass der neue Algorithmus nicht nur bei synthetischen Benchmarks gut abschneidet, sondern sein Potential bei echten Anwendungen noch besser entfalten kann. Der letztendlich daraus entstandene Broadcast wurde für die Open-Source MPI Bibliothek "Open MPI" entwickelt und basiert auf IP Multicast. Die erreichten Ergebnisse belegen, dass der neue Broadcast üblicherweise immer besser als jegliche Punkt-zu-Punkt Implementierungen ist, sobald die Anzahl von MPI Prozessen die Grenze von 8 Knoten überschreitet. Der Geschwindigkeitszuwachs erreicht einen Faktor von 4,9 bei 342 Knoten, da der neue Algorithmus praktisch unabhängig von der Knotenzahl skaliert. info:eu-repo/classification/ddc/004 ddc:004 Benchmark Broadcastingverfahren Cluster <Rechnernetz> Ethernet Hochleistungsrechnen MPI <Schnittstelle> Multicastingverfahren Open Source Wissenschaftliches Rechnen Collective Operations Kollektive Operationen MPI_Bcast Open MPI
22	ESPGOAL: A Dependency Driven Communication Framework Schneider, Timo, Eckelmann, Sven 01 June 2011 (has links) Optimized implementations of blocking and nonblocking collective operations are most important for scalable high-performance applications. Offloading such collective operations into the communication layer can improve performance and asynchronous progression of the operations. However, it is most important that such offloading schemes remain flexible in order to support user-defined (sparse neighbor) collective communications. In this work, we describe an operating system kernel-based architecture for implementing an interpreter for the flexible Group Operation Assembly Language (GOAL) framework to offload collective communications. We describe an optimized scheme to store the schedules that define the collective operations and show an extension to profile the performance of the kernel layer. Our microbenchmarks demonstrate the effectiveness of the approach and we show performance improvements over traditional progression in user-space. We also discuss complications with the design and offloading strategies in general.:1 Introduction 1.1 Related Work 2 The GOAL API 2.1 API Conventions 2.2 Basic GOAL Functionality 2.2.1 Initialization 2.2.2 Graph Creation 2.2.3 Adding Operations 2.2.4 Adding Dependencies 2.2.5 Scratchpad Buffer 2.2.6 Schedule Compilation 2.2.7 Schedule Execution 2.3 GOAL-Extensions 3 ESP Transport Layer 3.1 Receive Handling 3.2 Transfer Management 3.2.1 Known Problems 4 The Architecture of ESPGOAL 4.1 Control Flow 4.1.1 Loading the Kernel Module 4.1.2 Adding a Communicator 4.1.3 Starting a Schedule 4.1.4 Schedule Progression 4.1.5 Progression by ESP 4.1.6 Unloading the Kernel Module 4.2 Data Structures 4.2.1 Starting a Schedule 4.2.2 Transfer Management 4.2.3 Stack Overflow Avoidance 4.3 Interpreting a GOAL Schedule 5 Implementing Collectives in GOAL 5.1 Recursive Doubling 5.2 Bruck's Algorithm 5.3 Binomial Trees 5.4 MPI_Barrier 5.5 MPI_Gather 6 Benchmarks 6.1 Testbed 6.2 Interrupt coalescing parameters 6.3 Benchmarking Point to Point Latency 6.4 Benchmarking Local Operations 6.5 Benchmarking Collective Communication Latency 6.6 Benchmarking Collective Communication Host Overhead 6.7 Comparing Different Ways to use Ethernet NICs 7 Conclusions and Future Work 8 Acknowledgments info:eu-repo/classification/ddc/000 ddc:000

Search results

Efficient Broadcast for Multicast-Capable Interconnection Networks

ESPGOAL: A Dependency Driven Communication Framework