Global ETD Search

11	Monitoring of large-scale Cluster Computers Worm, Stefan 13 April 2007 (has links) (PDF) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit
12	Monitoring of large-scale Cluster Computers Worm, Stefan 12 February 2007 (has links) The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this work were, first of all, analyses to ensure the scalability of the monitoring solution for a large computer cluster as well as to prove the functionality of it in practise. To achieve this, a classification of monitoring activities in terms of the overall operation of a large computer system was accomplished first. Thereafter, methods and solutions were presented which are suitable for a general scenario to execute the process of monitoring as efficient and scalable as possible. During the course of this work, conclusions from the operation of an existing cluster for the operation of a new, more powerful system were drawn to ensure its functionality as good as possible. Consequently, a selection of applications from an existing pool of solutions was made to find one that is most suitable for the monitoring of the new cluster. The selection took place considering the special situation of the system like the usage of InfiniBand as the network interconnect. Further on, an additional software was developed which can read and process the different status information of the InfiniBand ports, unaffected by the vendor of the hardware. This functionality, which so far had not been available in free monitoring applications, was exemplarily realised for the chosen monitoring software. Finally, the influence of monitoring activities on the actual tasks of the cluster was of interest. To examine the influence on the CPU and the network, the self-developed plugin as well as a selection of typical monitoring values were used exemplarily. It could be proven that no impact on the productive application for typical monitoring intervals can be expected and only for atypically short intervals a minor influence could be determined. / Die ständige Überwachung eines Computers gehört zu den essentiellen Dingen, die zu tun sind um immer auf dem Laufenden zu sein, wie der aktuelle Zustand des Rechners ist. Dies ist trivial, wenn man direkt davor sitzt, aber wenn man einen Computer aus der Ferne beobachten soll ist dies schon nicht mehr so einfach möglich. Schwieriger wird es dann, wenn es eine große Anzahl an Rechnern zu überwachen gilt. Da der Vorgang der Überwachung auch immer etwas Netzwerklast und Last auf dem zu überwachenden Rechner selber verursacht, ist es wichtig diese Einflüsse so gering wie möglich zu halten. Gerade dann, wenn man viele Computer zu einem leistungsfähigen Cluster zusammen geschalten hat ist es notwendig, dass diese Überwachungslösung möglichst effizient funktioniert und die eigentliche Arbeit des Supercomputers nicht stört. Die Hauptziele dieser Arbeit sind deshalb Analysen zur Sicherstellung der Skalierbarkeit der Überwachungslösung für einen großen Computer Cluster, sowie der praktische Nachweis der Funktionalität dieser. Dazu wurde zuerst eine Einordnung des Monitorings in den Gesamtbetrieb eines großen Computersystems vorgenommen. Danach wurden Methoden und Lösungen aufgezeigt, welche in einem allgemeinen Szenario geeignet sind, um den ganzheitlichen Vorgang der Überwachung möglichst effizient und skalierbar durchzuführen. Im weiteren Verlauf wurde darauf eingegangen welche Lehren aus dem Betrieb eines vorhandenen Clusters für den Betrieb eines neuen, leistungsfähigeren Systems gezogen werden können um dessen Funktion möglichst gut gewährleisten zu können. Darauf aufbauend wurde eine Auswahl getroffen, welche Anwendung aus einer Menge existierende Lösungen heraus, zur Überwachung des neuen Clusters besonders geeignet ist. Dies fand unter Berücksichtigung der spezielle Situation, zum Beispiel der Verwendung von InfiniBand als Verbindungsnetzwerk, statt. Im Zuge dessen wurde eine zusätzliche Software entwickelt, welche die verschiedensten Statusinformationen der InfiniBand Anschlüsse auslesen und verarbeiten kann, unabhängig vom Hersteller der Hardware. Diese Funktionalität, welche im Bereich der freien Überwachungsanwendungen bisher ansonsten noch nicht verfügbar war, wurde beispielhaft für die gewählte Monitoring Software umgesetzt. Letztlich war der Einfluss der Überwachungsaktivitäten auf die eigentlichen Anwendungen des Clusters von Interesse. Dazu wurden exemplarisch das selbst entwickelte Plugin sowie eine Auswahl an typischen Überwachungswerten benutzt, um den Einfluss auf die CPU und das Netzwerk zu untersuchen. Dabei wurde gezeigt, dass für typische Überwachungsintervalle keine Einschränkungen der eigentlichen Anwendung zu erwarten sind und dass überhaupt nur für untypisch kurze Intervalle ein geringer Einfluss festzustellen war. info:eu-repo/classification/ddc/004 ddc:004 Chemnitz Cluster Cluster <Datenanalyse> Computer Leistungsbewertung Leistungsmessung Management Managementinformationssystem Monitoring Monitoring <Informatik> Nagios Netzwerk Plug in Rechnernetz Skalierbares Mehrprozessorsystem Skalierbarkeit ABINIT CHiC Chemnitz High-Performance Linux Cluster Cluster Computer Computer Cluster InfiniBand OFED Plugin error counters local monitoring netgauge network performance performance counters port counters remote monitoring scalability
13	Multiple Constant Multiplication Optimization Using Common Subexpression Elimination and Redundant Numbers Al-Hasani, Firas Ali Jawad January 2014 (has links) The multiple constant multiplication (MCM) operation is a fundamental operation in digital signal processing (DSP) and digital image processing (DIP). Examples of the MCM are in finite impulse response (FIR) and infinite impulse response (IIR) filters, matrix multiplication, and transforms. The aim of this work is minimizing the complexity of the MCM operation using common subexpression elimination (CSE) technique and redundant number representations. The CSE technique searches and eliminates common digit patterns (subexpressions) among MCM coefficients. More common subexpressions can be found by representing the MCM coefficients using redundant number representations. A CSE algorithm is proposed that works on a type of redundant numbers called the zero-dominant set (ZDS). The ZDS is an extension over the representations of minimum number of non-zero digits called minimum Hamming weight (MHW). Using the ZDS improves CSE algorithms' performance as compared with using the MHW representations. The disadvantage of using the ZDS is it increases the possibility of overlapping patterns (digit collisions). In this case, one or more digits are shared between a number of patterns. Eliminating a pattern results in losing other patterns because of eliminating the common digits. A pattern preservation algorithm (PPA) is developed to resolve the overlapping patterns in the representations. A tree and graph encoders are proposed to generate a larger space of number representations. The algorithms generate redundant representations of a value for a given digit set, radix, and wordlength. The tree encoder is modified to search for common subexpressions simultaneously with generating of the representation tree. A complexity measure is proposed to compare between the subexpressions at each node. The algorithm terminates generating the rest of the representation tree when it finds subexpressions with maximum sharing. This reduces the search space while minimizes the hardware complexity. A combinatoric model of the MCM problem is proposed in this work. The model is obtained by enumerating all the possible solutions of the MCM that resemble a graph called the demand graph. Arc routing on this graph gives the solutions of the MCM problem. A similar arc routing is found in the capacitated arc routing such as the winter salting problem. Ant colony optimization (ACO) meta-heuristics is proposed to traverse the demand graph. The ACO is simulated on a PC using Python programming language. This is to verify the model correctness and the work of the ACO. A parallel simulation of the ACO is carried out on a multi-core super computer using C++ boost graph library. Common subexpression elimination (CSE) multiple constant multiplication (MCM) multiplier block (MB) adder step logic depth (LD) logic operator (LO) lower bound and optimality graph dependent method radix number system redundant number representations pattern preservation algorithm (PPA) zero-dominant set (ZDS) polynomial ring radix polynomials complete residue system modulo radix congruent relation tree encoder graph encoder subexpression tree algorithm (STA) A-operation demand graph dynamic capacitated arc routing problem metaheuristics ant colony optimization (ACO) max-min ant system (MMAS) parallel computing computer cluster.

Page generated in 0.1009 seconds