Return to search

Concepts and Prototype for a Collective Offload Unit

Optimized implementations of blocking and nonblocking collective operations are most important for scalable high-performance applications. Offloading such collective operations into the communication layer can improve performance and asynchronous progression of the operations. However, it is most important that such offloading schemes remain flexible in order to support user-defined (sparse neighbor) collective communications. In this work we propose a design for a collective offload unit.
Our hardware design is able to execute dependency graph based representations of collective functions. To cope with the scarcity of memory resources we designed a new point to point messaging protocol which does not need to store information about unexpected messages. The offload unit proposed in this thesis could be integrated into high performance networks such as EXTOLL. Our design achieves a clock frequency of 212 MHz on a Xilinx Virtex6 FPGA, while using less than 10% of the available logic slices and less than 30% of the available memory blocks. Due to the specialization of our design we can accelerate important tasks of the message passing framework, such as message matching by a factor of two, compared to a software implementation running on a CPU with a ten times higher clock speed.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa.de:bsz:ch1-qucosa-89006
Date18 June 2012
CreatorsSchneider, Timo, Eckelmann, Sven
ContributorsTU Chemnitz, Fakultät für Informatik, Prof. Dr Wolfgang Rehm, Prof. Dr. Torsten Hoefler, Dipl. Inf. Jochen Strunk, Prof. Dr Wolfgang Rehm
PublisherUniversitätsbibliothek Chemnitz
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:masterThesis
Formatapplication/pdf, text/plain, application/zip

Page generated in 0.0021 seconds