Global ETD Search

1	Linear Programming Based Resource Management for Heterogeneous Computing Systems Al-Azzoni, Issam 05 1900 (has links) An emerging trend in computing is to use distributed heterogeneous computing (HC) systems to execute a set of tasks. Cluster computer systems, grids, and Desktop Grids are three popular kinds of HC systems. An important component of an HC system is its resource management system (RMS). The main responsibility of an RMS is assigning resources to tasks in order to satisfy certain performance requirements. For cluster computer systems, we propose a new mapping heuristic which requires less state information than current heuristics. For Desktop Grids, we propose a new scheduling policy that exploits knowledge of the effective computing power delivered by the machines and the distribution of their fault times in order to improve performance. Finally, for grids, we propose a new decentralized load balancing policy which dramatically cuts down the communication overhead incurred in state information update. The proposed resource management policies utilize the solution to a linear programming problem (LP) which maximizes the system capacity. Our simulation experiments show that these policies perform very competitively, especially in highly heterogeneous systems. / Thesis / Doctor of Philosophy (PhD) distributed heterogenous computing resource management system linear programming problem Desktop Grids cluster computer system performance
2	Improving The Communication Performance Of I/O Intensive And Communication Intensive Application In Cluster Computer Systems Kumar, V Santhosh 10 1900 (has links) Cluster computer systems assembled from commodity off-the-shelf components have emerged as a viable and cost-effective alternative to high-end custom parallel computer systems.In this thesis, we investigate how scalable performance can be achieved for database systems on clusters. In this context we specﬁcally considered database query processing for evaluation of botlenecks and suggest optimization techniques for obtaining scalable application performance. First we systematically demonstrated that in a large cluster with high disk bandwidth, the processing capability and the I/O bus bandwidth are the two major performance bottlenecks in database systems. To identify and assess bottlenecks, we developed a Petri net model of parallel query execution on a cluster. Once identiﬁed and assessed,we address the above two performance bottlenecks by offoading certain application related tasks to the processor in the network interface card. Offoading application tasks to the processor in the network interface cards shifts the bottleneck from cluster processor to I/O bus. Further, we propose a hardware scheme,network attached disk ,and a software scheme to achieve a balanced utilization of re-sources like host processor, I/O bus, and processor in the network interface card. The proposed schemes result in a speedup of upto 1.47 compared to the base scheme, and ensures scalable performance upto 64 processors. Encouraged by the beneﬁts of ofﬂoading application tasks to network processors, we explore the possibilities of performing the bloom ﬁlter operations in network processors. We combine ofﬂoading bloom ﬁlter operations with the proposed hardware schemes to achieve upto 50% reduction in execution time. The later part of the thesis provides introductory experiments conducted in Community At-mospheric Model(CAM), a large scale parallel application used for global weather and climate prediction. CAM is a communication intensive application that involves collective communication of large messages. In our limited experiment, we identiﬁed CAM to see the effect of compression techniques and ofﬂoading techniques (as formulated for database) on the performance of communication intensive applications. Due to time constraint, we considered only the possibility of compression technique for improving the application performance. However, ofﬂoading technique could be taken as a full-ﬂedged research problem for further investigation In our experiment, we found compression of messages reduces the message latencies, and hence improves the execution time and scalability of the application. Without using compression techniques, performance measured on 64 processor cluster resulted in a speed up of only 15.6. While lossless compression retains the accuracy and correctness of the program, it does not result in high compression. We therefore propose lossy compression technique which can achieve a higher compression, yet retain the accuracy and numerical stability of the application while achieving a scalable performance. This leads to speedup of 31.7 on 64 processors compared to a speedup of 15.6 without message compression. We establish that the accuracy within prescribed limit of variation and numerical stability of CAM is retained under lossy compression. Computer Communication Input/output Communication Database Query Processing Community Atmospheric Model (CAM) Network Processors Bloom Filter Processing Cluster Computer Systems Offloading Application Computer Science
3	Monitoring of large-scale Cluster Computers Worm, Stefan 13 April 2007 (has links) (PDF) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit
4	Monitoring of large-scale Cluster Computers Worm, Stefan 12 February 2007 (has links) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. info:eu-repo/classification/ddc/004 ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability

1

Page generated in 0.0598 seconds