Global ETD Search

11	IMPROVING MESSAGE-PASSING PERFORMANCE AND SCALABILITY IN HIGH-PERFORMANCE CLUSTERS RASHTI, Mohammad Javad 26 January 2011 (has links) High Performance Computing (HPC) is the key to solving many scientific, financial, and engineering problems. Computer clusters are now the dominant architecture for HPC. The scale of clusters, both in terms of processor per node and the number of nodes, is increasing rapidly, reaching petascales these days and soon to exascales. Inter-process communication plays a significant role in the overall performance of HPC applications. With the continuous enhancements in interconnection technologies and node architectures, the Message Passing Interface (MPI) needs to be improved to effectively utilize the modern technologies for higher performance. After providing a background, I present a deep analysis of the user level and MPI libraries over modern cluster interconnects: InfiniBand, iWARP Ethernet, and Myrinet. Using novel techniques, I assess characteristics such as overlap and communication progress ability, buffer reuse effect on latency, and multiple-connection scalability. The outcome highlights some of the inefficiencies that exist in the communication libraries. To improve communication progress and overlap in large message transfers, a method is proposed which uses speculative communication to overlap communication with computation in the MPI Rendezvous protocol. The results show up to 100% communication progress and more than 80% overlap ability over iWARP Ethernet. An adaptation mechanism is employed to avoid overhead on applications that do not benefit from the method due to their timing specifications. To reduce MPI communication latency, I have proposed a technique that exploits the application buffer reuse characteristics for small messages and eliminates the sender-side copy in both two-sided and one-sided MPI small message transfer protocols. The implementation over InfiniBand improves small message latency up to 20%. The implementation adaptively falls back to the current method if the application does not benefit from the proposed technique. Finally, to improve scalability of MPI applications on ultra-scale clusters, I have proposed an extension to the current iWARP standard. The extension improves performance and memory usage for large-scale clusters. The extension equips Ethernet with an efficient zero-copy, connection-less datagram transport. The software-level evaluation shows more than 40% performance benefits and 30% memory usage reduction for MPI applications on a 64-core cluster. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2010-10-16 12:25:18.388 High Performance Computing Message Passing Computer Clusters Interconnection Networks
12	DESIGN ENHANCEMENT AND INTEGRATION OF A PROCESSOR-MEMORY INTERCONNECT NETWORK INTO A SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE Bhide, Kanchan P. 01 January 2004 (has links) This thesis involves modeling, design, Hardware Description Language (HDL) design capture, synthesis, implementation and HDL virtual prototype simulation validation of an interconnect network for a Hybrid Data/Command Driven Computer Architecture (HDCA) system. The HDCA is a single-chip shared memory multiprocessor architecture system. Various candidate processor-memory interconnect topologies that may meet the requirements of the HDCA system are studied and evaluated related to utilization within the HDCA system. It is determined that the Crossbar network topology best meets the HDCA system requirements and it is therefore used as the processormemory interconnect network of the HDCA system. The design capture, synthesis, implementation and HDL simulation is done in VHDL using XILINX ISE 6.2.3i and ModelSim 5.7g CAD softwares. The design is validated by individually testing against some possible test cases and then integrated into the HDCA system and validated against two different applications. The inclusion of crossbar switch in the HDCA architecture involved major modifications to the HDCA system and some minor changes in the design of the switch. Virtual Prototype testing of the HDCA executing applications when utilizing crossbar interconnect revealed proper functioning of the interconnect and HDCA. Inclusion of the interconnect into the HDCA now allows it to implement dynamic node level reconfigurability and multiple forking functionality.
13	Performance analysis and improvement of InfiniBand networks : modelling and effective Quality-of-Service mechanisms for interconnection networks in cluster computing systems Yan, Shihang January 2012 (has links) The InfiniBand Architecture (IBA) network has been proposed as a new industrial standard with high-bandwidth and low-latency suitable for constructing high-performance interconnected cluster computing systems. This architecture replaces the traditional bus-based interconnection with a switch-based network for the server Input-Output (I/O) and inter-processor communications. The efficient Quality-of-Service (QoS) mechanism is fundamental to ensure the import at QoS metrics, such as maximum throughput and minimum latency, leaving aside other aspects like guarantee to reduce the delay, blocking probability, and mean queue length, etc. Performance modelling and analysis has been and continues to be of great theoretical and practical importance in the design and development of communication networks. This thesis aims to investigate efficient and cost-effective QoS mechanisms for performance analysis and improvement of InfiniBand networks in cluster-based computing systems. Firstly, a rate-based source-response link-by-link admission and congestion control function with improved Explicit Congestion Notification (ECN) packet marking scheme is developed. This function adopts the rate control to reduce congestion of multiple-class traffic. Secondly, a credit-based flow control scheme is presented to reduce the mean queue length, throughput and response time of the system. In order to evaluate the performance of this scheme, a new queueing network model is developed. Theoretical analysis and simulation experiments show that these two schemes are quite effective and suitable for InfiniBand networks. Finally, to obtain a thorough and deep understanding of the performance attributes of InfiniBand Architecture network, two efficient threshold function flow control mechanisms are proposed to enhance the QoS of InfiniBand networks; one is Entry Threshold that sets the threshold for each entry in the arbitration table, and other is Arrival Job Threshold that sets the threshold based on the number of jobs in each Virtual Lane. Furthermore, the principle of Maximum Entropy is adopted to analyse these two new mechanisms with the Generalized Exponential (GE)-Type distribution for modelling the inter-arrival times and service times of the input traffic. Extensive simulation experiments are conducted to validate the accuracy of the analytical models. 004
14	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. Martins, João Eduardo Machado Perea 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Optical interconnection networks Parallel computer architecture Photonic Redes ópticas de interconexão
15	EXPLOITING SPARSENESS OF COMMUNICATION PATTERNS FOR THE DESIGN OF NETWORKS IN MASSIVELY PARALLEL SUPERCOMPUTERS Mattox, Timothy Ian 01 January 2006 (has links) A limited set of Processing Element (PE) pairs in a parallel computer cover the internal communications of scalable parallel programs. We take advantage of this property using the concept of Sparse Flat Neighborhood Networks (Sparse FNNs). Sparse FNNs are network designs that provide single-switch latency and full wire bandwidth for each specified PE pair, despite using relatively few network interfaces per PE and switches that have far fewer ports than there are PEs. This dissertation discusses the design problem, runtime support, and working prototype (KASY0) for Sparse FNNs. KASY0 not only demonstrated the claimed properties, but also set world records for its price/performance and performance on a specific application. Parallel supercomputers execute many portions of an application simultaneously. For scalable programs, the more PEs the system has, the greater the potential speedup. Portions executing on different PEs may be able to work independently for short periods, but the performance desired might not be achieved due to delays in communication between PEs. The set of PE pairs that will communicate often is both predictable and small relative to the number of possible PE pairings. This sparseness property can be exploited in the design and implementation of networks for massively parallel supercomputers. The sparseness of communicating pairs is rooted in the fact that each of the human-designed communication patterns commonly used in parallel programs has the property that the number of communicating pairs grows relatively slowly as the number of PEs is increased. Additionally, the number of pairs in the union of all communication patterns used in a suite of parallel programs grows surprisingly slowly due to pair synergy: the same pair often appears in multiple communication patterns. Detailed analysis of communication patterns clearly shows that the number of PE pairs actually communicating is very sparse, although the structure of the sparseness can be complex.
16	Προσεγγιστικά αναλυτικά μοντέλα για τη μελέτη της απόδοσης πολυβάθμιων διασυνδεμένων δικτύων μεταγωγής Στεργίου, Ελευθέριος 05 January 2011 (has links) H παρούσα ερευνητική εργασία αφορά την εκτίμηση της απόδοσης πολυβάθμιων διασυνδεδεμένων δικτύων μεταγωγής. Για την εκτίμηση της απόδοσης αναπτύχθηκαν προσεγγιστικά αναλυτικά μοντέλα τα οποία και παρουσιάζονται στην εργασία αυτή. Πιο συγκεκριμένα: 1. Παρουσιάζεται μια πρωτότυπη ολοκληρωμένη μεθοδολογία εύρεσης της απόδοσης αυτό-δρομολογούμενων απλών πολυβάθμιων διασυνδεδεμένων δικτύων (πχ κλασσικά δίκτυα banyan) τα οποία συγκροτούνται από συμμετρικά στοιχειώδη συστήματα μεταγωγής (πχ 2x2 Switch Element). Το μοντέλο που δημιουργήθηκε βασίστηκε στην λειτουργία και την συμπεριφορά μιας τυχαίας μνήμης (ουράς) ενός στοιχειώδους συστήματος μεταγωγής. Βασιζόμενοι στην ανάλυση, η οποία συμπεριλαμβάνει έναν επαναληπτικό αλγόριθμο ο οποίος συγκλίνει σε πολύ λίγες επαναλήψεις, υπολογίζουμε την Χρησιμοποίηση των ουρών του συστήματος. Στην συνεχεία προσδιορίζουμε τους λοιπούς δείκτες απόδοσης. 2. Παρουσιάζεται διαδικασία εκτίμησης της απόδοσης πολυβάθμιων διασυνδεδεμένων δικτύων μεταγωγής, τα οποία έχουν την ικανότητα να εξυπηρετούν φορτίο με δύο οι περισσότερες προτεραιότητες. Προτάθηκε ένα στοιχειώδες σύστημα μεταγωγής (SE- Switch Element) το οποίο διαθέτει παράλληλες μνήμες σε κάθε είσοδο, μία για κάθε υποστηριζόμενη προτεραιότητα φορτίου, και το οποίο μοντελοποιήθηκε με την βοήθεια ουρών. Βασιζόμενοι στην ανάλυση του μοντέλου αυτού και με την βοήθεια σχετικού επαναληπτικού αλγορίθμου ο οποίος συγκλίνει με λίγες επαναλήψεις, υπολογίστηκαν με ακρίβεια όλοι οι δείκτες απόδοσης. 3. Επιπρόσθετα, αναπτύσσεται μια ακόμη πρωτότυπη αναλυτική προσέγγιση η οποία παρέχει την εκτίμηση της απόδοσης πολυβάθμιων διασυνδεδεμένων δικτύων μεταγωγής με ένα ή περισσότερα επίπεδα τα οποία εφαρμόζουν ως τεχνική εκπομπής πακέτων την τεχνική ‘full multicast’, όταν τα δίκτυα αυτά εξυπηρετούν φορτίο απλής και πολλαπλής εκπομπής (multicast). Δημιουργήθηκε σχετικό μοντέλο για την μελέτη των δικτύων αυτών. Απεδείχθη ότι τα διασυνδεδεμένα δίκτυα τα οποία διαθέτουν περιορισμένο αριθμό επιπέδων, υποστηρίζουν με εξαιρετική αποτελεσματικότητα φορτίο απλής και πολλαπλής εκπομπής (multicast). 4. Αναπτύσσεται και άλλη αναλυτική μελέτη η οποία παρέχει την εκτίμηση της απόδοσης πολυβάθμιων διασυνδεδεμένων δικτύων μεταγωγής με ένα ή περισσότερα επίπεδα τα οποία όμως εφαρμόζουν ως τεχνική εκπομπής πακέτων την τεχνική ‘partial multicast’. 5. Παρουσιάζεται αναλυτική προσέγγιση απόδοσης η οποία αφορά αυτο-δρομολoγούμενα πολυβάθμια συστήματα με περιορισμένα επίπεδα τα οποία όμως εφαρμόζουν ταυτόχρονα δύο διαφορετικές πολιτικές εκπομπής πακέτων, μία σε κάθε τμήμα τους. Και πάλι ακολουθώντας παρόμοια διαδικασία προσδιορίστηκαν όλοι οι δείκτες απόδοσης των πολυβάθμιων δικτύων αυτών 6. Για διευκόλυνση των μελετητών, ορίστηκε ένας γενικός συντελεστής απόδοσης (CPF) του συστήματος ο οποίος εκφράζει την γενική απόδοση μιας πολυβάθμιας συσκευής μεταγωγής πακέτων, λαμβάνοντας υπ όψιν όλους τους ανεξάρτητους δείκτες, με βάση συγκεκριμένα κριτήρια. Αξιοσημείωτο είναι ότι όλες οι αναλυτικές μέθοδοι παρέχουν αναλυτικά αποτελέσματα για όλα τα ενδιάμεσα στάδια. Όλα τα αποτελέσματα τα οποία προέκυψαν από εφαρμογή των αναλυτικών μεθόδων επιβεβαιώθηκαν με προσομοιώσεις που δημιουργήθηκαν γι αυτό τον σκοπό. Επίσης τα αποτελέσματα τα οποία ελήφθησαν από τις αναλυτικές μεθόδους, συγκρίθηκαν με αποτελέσματα από παλαιότερες εργασίες. Η σύγκριση αναδεικνύει την μεγαλύτερη ακρίβεια και ταχύτητα των αναλυτικών μεθόδων που παρουσιάζονται στην παρούσα εργασία έναντι όλων των παλαιοτέρων ερευνητικών τεχνικών. Εξετάζοντας τις σχετική ερευνητική βιβλιογραφία καθίσταται πρόδηλο ότι υπάρχει ανεπάρκεια αναλυτικών μελετών οι οποίες να καλύπτουν θέματα εκτίμησης απόδοσης συγχρόνων δικτύων μεταγωγής, όπως πχ είναι τα πολυεπίπεδα δίκτυα. Οι παραπάνω αναλυτικές προσεγγίσεις αναμένεται να είναι ένα χρήσιμο εργαλείο για τους σχεδιαστές και κατασκευαστές δικτυακών συστημάτων στην προσπάθειά τους να πετύχουν κατασκευή δικτύων με καλύτερη ποιότητα εξυπηρέτησης (QoS). / This research work concerns the performance evaluation of multistage, interconnected switching networks. To assess the performance, approximated analytical models are developed and presented. In particular: 1. A novel integrated methodology for assessing the performance of simple, self-routing, multistage, interconnected networks (e.g. banyan networks), which are formed by symmetrical switch elements, is presented. The model that is created is based on the function and behaviour of a random simple multistage switch system in a memory level (queue). Based on analysis, which includes a repetitive algorithm that converges within a small number of iterations, the queue's’ utilisation is estimated. Subsequently, other performance indicators are determined. 2. A performance evaluation process for multistage interconnection networks, which has the ability to service traffic with two or more classes of priorities, is presented. Particularly, a new switch element which has parallel memories in each entry is proposed to ensure effective servicing of multi-priority traffic. This switch element has one memory for each supported class of priority, and is modelled by means of queues. Based on the analysis provided by this model, and in conjunction with the application of a repetitive algorithm which converges with few iterations, all performance indicators were precisely calculated. 3. In addition, a novel analytical approach was developed that provides a performance evaluation of multistage interconnection networks that have one or more levels which apply the packet transmission ‘full multicast’ method when these networks serve unicast and multicast traffic. A relevant study model for those networks was created. It appears that the interconnected networks which have a limited number of levels lend excellent support with effective unicast and multicast traffic. 4. The study provides a performance evaluation of multistage interconnection networks with one or more levels, and uses a technical transmission packet technique for multicast traffic, the ‘partial multicast’ operation. 5. Also is presented an analytical approach that estimates a performance evaluation of self-routing, multistage interconnection networks (which have a limited number of levels) that apply two different transmission packet techniques in each segment. By application of a similar procedure, all the performance indicators of multistage networks are identified. 6. To assist designers, a compound performance factor (CPF) is defined which expresses the overall performance evaluation of multistage interconnection network devices (taking into account all the individual performance factors, according to a specific set of criteria). It is noteworthy that all of the analytical methods provide detailed results for all intermediate stages. All of the results obtained by application of analytical methods are confirmed by simulations. The results garnered by analytical methods are also compared with the results from previous work. The comparison highlights the greater accuracy and speed that these analytical methods have over older research techniques. Examination of the relevant research literature makes it evident that there is an insufficient number of analytical studies which cover the performance evaluation issue relating to modern switched networks; for example, multi-layered networks. This gap in the field of research is completed by this work. These analytical approaches will be useful tools for designers and manufacturers of network systems in their efforts to provide better quality of service (QoS). Αναλυτική μελέτη 004.36 Analytical method Multistage interconnection networks
17	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. João Eduardo Machado Perea Martins 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Redes ópticas de interconexão Optical interconnection networks Parallel computer architecture Photonic
18	Exchanged Crossed Cube: A Novel Interconnection Network for Parallel Computation Li, K., Mu, Y., Li, K., Min, Geyong January 2013 (has links) The topology of interconnection networks plays a key role in the performance of parallel computing systems. A new interconnection network called exchanged crossed cube (ECQ) is proposed and analyzed in this paper. We prove that ECQ has the better properties than other variations of the basic hypercube in terms of the smaller diameter, fewer links, and lower cost factor, which indicates the reduced communication overhead, lower hardware cost, and more balanced consideration among performance and cost. Furthermore, it maintains several attractive advantages including recursive structure, high partitionability, and strong connectivity. Furthermore, the optimal routing and broadcasting algorithms are proposed for this new network topology. Interconnection networks ; Hypercube ; Exchanged crossed cube ; Interprocessor communication ; Parallel computation ; Topological properties ; Hypercube ; Architecture
19	Performance modelling and evaluation of virtual channels in multicomputer networks with bursty traffic Min, Geyong, Ould-Khaoua, M. January 2004 (has links) No Multicomputers Interconnection networks Pipelined circuit switching Bursty traffic Message latency Performance modelling and analysis
20	A performance model for wormhole-switched interconnection networks under self-similar traffic. Min, Geyong, Ould-Khaoua, M. January 2004 (has links) No / Many recent studies have convincingly demonstrated that network traffic exhibits a noticeable self-similar nature which has a considerable impact on queuing performance. However, the networks used in current multicomputers have been primarily designed and analyzed under the assumption of the traditional Poisson arrival process, which is inherently unable to capture traffic self-similarity. Consequently, it is crucial to reexamine the performance properties of multicomputer networks in the context of more realistic traffic models before practical implementations show their potential faults. In an effort toward this end, this paper proposes the first analytical model for wormhole-switched k-ary n-cubes in the presence of self-similar traffic. Simulation experiments demonstrate that the proposed model exhibits a good degree of accuracy for various system sizes and under different operating conditions. The analytical model is then used to investigate the implications of traffic self-similarity on network performance. This study reveals that the network suffers considerable performance degradation when subjected to self-similar traffic, stressing the great need for improving network performance to ensure efficient support for this type of traffic. Multicomputers Interconnection networks Traffic self-similarity Adaptive routing Virtual channels Performance modeling

Search results