Global ETD Search

1	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 30 June 2009 (has links) (PDF) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM ddc:004 Hochleistungsrechnen Parallelrechner
2	Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico 31 March 2009 (has links) This thesis analysed the QLogic InﬁniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InﬁniHost III Lx HCA which uses an oﬄoad architecture. As expected, the QLogic InﬁniPath QLE7140 HCA can outperform the Mellanox InﬁniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the eﬀective bandwidth for one HCA by up to 30%. Diﬀerent all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for diﬀerent scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InﬁniBand easily by using the RDMA-CM API. info:eu-repo/classification/ddc/004 ddc:004 Hochleistungsrechnen Parallelrechner InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM
3	Monitoring of large-scale Cluster Computers Worm, Stefan 13 April 2007 (has links) (PDF) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit
4	Monitoring of large-scale Cluster Computers Worm, Stefan 12 February 2007 (has links) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. info:eu-repo/classification/ddc/004 ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability

1

Page generated in 0.018 seconds