• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • 1
  • Tagged with
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes

Franke, Maik 24 September 2007 (has links) (PDF)
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared.
2

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Mittenzwey, Nico 30 June 2009 (has links) (PDF)
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.
3

Optimierte Implementierung ausgewählter kollektiver Operationen unter Ausnutzung der Hardwareparallelität des InfiniBand Netzwerkes

Franke, Maik 30 April 2007 (has links)
Ziel der Arbet ist eine optimierte Implementierung der im MPI-1 Standard definierten Reduktionsoperationen MPI_Reduce(), MPI_Allreduce(), MPI_Scan(), MPI_Reduce_scatter() für das InfiniBand Netzwerk. Hierbei soll besonderer Wert auf spezielle InfiniBand Operationen und die Hardwareparallelität gelegt werden. InfiniBand ermöglicht es Kommunikationsoperationen klar von Berechnungen zu trennen, was eine Überlappung beider Operationstypen in der Reduktion ermöglicht. Das Potential dieser Methode soll modelltheoretisch als auch praktisch in einer prototypischen Implementierung im Rahmen des Open MPI Frameworks erfolgen. Das Endresultat soll mit vorhandenen Implementierungen (z.B. MVAPICH) verglichen werden. / The performance of collective communication operations is one of the deciding factors in the overall performance of a MPI application. Current implementations of MPI use the point-to-point components to access the InfiniBand network. Therefore it is tried to improve the performance of a collective component by accessing the InfiniBand network directly. This should avoid overhead and make it possible to tune the algorithms to this specific network. Various algorithms for the MPI_Reduce, MPI_Allreduce, MPI_Scan and MPI_Reduce_scatter operations are presented. The theoretical performance of the algorithms is analyzed with the LogfP and LogGP models. Selected algorithms are implemented as part of an Open MPI collective component. Finally the performance of different algorithms and different MPI implementations is compared.
4

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Mittenzwey, Nico 31 March 2009 (has links)
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.

Page generated in 0.032 seconds