Global ETD Search

1	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 30 June 2009 (has links) (PDF) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM ddc:004 Hochleistungsrechnen Parallelrechner
2	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 31 March 2009 (has links) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Parallelrechner InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM

Search results

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA