• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 31
  • 5
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 48
  • 48
  • 48
  • 19
  • 13
  • 11
  • 10
  • 10
  • 8
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Parallel processing in power systems computation on a distributed memory message passing multicomputer

Hong, Chao, 洪潮 January 2000 (has links)
published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
22

Simulation modelling of distributed-shared memory multiprocessors

Marurngsith, Worawan January 2006 (has links)
Distributed shared memory (DSM) systems have been recognised as a compelling platform for parallel computing due to the programming advantages and scalability. DSM systems allow applications to access data in a logically shared address space by abstracting away the distinction of physical memory location. As the location of data is transparent, the sources of overhead caused by accessing the distant memories are difficult to analyse. This memory locality problem has been identified as crucial to DSM performance. Many researchers have investigated the problem using simulation as a tool for conducting experiments resulting in the progressive evolution of DSM systems. Nevertheless, both the diversity of architectural configurations and the rapid advance of DSM implementations impose constraints on simulation model designs in two issues: the limitation of the simulation framework on model extensibility and the lack of verification applicability during a simulation run causing the delay in verification process. This thesis studies simulation modelling techniques for memory locality analysis of various DSM systems implemented on top of a cluster of symmetric multiprocessors. The thesis presents a simulation technique to promote model extensibility and proposes a technique for verification applicability, called a Specification-based Parameter Model Interaction (SPMI). The proposed techniques have been implemented in a new interpretation-driven simulation called DSiMCLUSTER on top of a discrete event simulation (DES) engine known as HASE. Experiments have been conducted to determine which factors are most influential on the degree of locality and to determine the possibility to maximise the stability of performance. DSiMCLUSTER has been validated against a SunFire 15K server and has achieved similarity of cache miss results, an average of +-6% with the worst case less than 15% of difference. These results confirm that the techniques used in developing the DSiMCLUSTER can contribute ways to achieve both (a) a highly extensible simulation framework to keep up with the ongoing innovation of the DSM architecture, and (b) the verification applicability resulting in an efficient framework for memory analysis experiments on DSM architecture.
23

Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems

Naeem, Abdul January 2013 (has links)
The shared memory systems should support parallelization at the computation (multi-core), communication (Network-on-Chip, NoC) and memory architecture levels to exploit the potential performance benefits. These parallel systems supporting shared memory abstraction both in the general purpose and application specific domains are confronting the critical issue of memory consistency. The memory consistency issue arises due to the unconstrained memory operations which leads to the unexpected behavior of shared memory systems. The memory consistency models enforce ordering constraints on the memory operations for the expected behavior of the shared memory systems. The intuitive Sequential Consistency (SC) model enforces strict ordering constraints on the memory operations and does not take advantage of the system optimizations both in the hardware and software. Alternatively, the relaxed memory consistency models relax the ordering constraints on the memory operations and exploit these optimizations to enhance the system performance at the reasonable cost. The purpose of this thesis is twofold. First, the novel architecture supports are provided for the different memory consistency models like: SC, Total Store Ordering (TSO), Partial Store Ordering (PSO), Weak Consistency (WC), Release Consistency (RC) and Protected Release Consistency (PRC) in the NoC-based multi-core (McNoC) systems. The PRC model is proposed as an extension of the RC model which provides additional reordering and relaxation in the memory operations. Second, the scalability analysis of these memory consistency models is performed in the McNoC systems. The architecture supports for these different memory consistency models are provided in the McNoC platforms. Each configurable McNoC platform uses a packet-switched 2-D mesh NoC with deflection routing policy, distributed shared memory (DSM), distributed locks and customized processor interface. The memory consistency models/protocols are implemented in the customized processor interfaces which are developed to integrate the processors with the rest of the system. The realization schemes for the memory consistency models are based on a transaction counter and an an an address ddress ddress ddress ddress ddress ddress stack tacktack-based based based based based based novel approaches.approaches.approaches.approaches. approaches.approaches.approaches.approaches.approaches.approaches. The transaction counter is used in each node of the network to keep track of the outstanding memory operations issued by a processor in the system. The address stack is used in each node of the network to keep track of the addresses of the outstanding memory operations issued by a processor in the system. These hardware structures are used in the processor interface to enforce the required global orders under these different memory consistency models. The realization scheme of the PRC model in addition also uses acquire counter for further classification of the data operations as unprotected and protected operations. The scalability analysis of these different memory consistency models is performed on the basis of different workloads which are developed and mapped on the various sized networks. The scalability study is conducted in the McNoC systems with 1 to 64-cores with various applications using different problem sizes and traffic patterns. The performance metrics like execution time, performance, speedup, overhead and efficiency are evaluated as a function of the network size. The experiments are conducted both with the synthetic and application workloads. The experimental results under different application workloads show that the average execution time under the relaxed memory consistency models decreases relative to the SC model. The specific numbers are highly sensitive to the application and depend on how well it matches to the architectures. This study shows the performance improvement under the relaxed memory consistency models over the SC model that is dependent on the computation-to-communication ratio, traffic patterns, data-to-synchronization ratio and the problem size. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up. / <p>QC 20130204</p>
24

Efficient openMP over sequentially consistent distributed shared memory systems

Costa Prats, Juan José 20 July 2011 (has links)
Nowadays clusters are one of the most used platforms in High Performance Computing and most programmers use the Message Passing Interface (MPI) library to program their applications in these distributed platforms getting their maximum performance, although it is a complex task. On the other side, OpenMP has been established as the de facto standard to program applications on shared memory platforms because it is easy to use and obtains good performance without too much effort. So, could it be possible to join both worlds? Could programmers use the easiness of OpenMP in distributed platforms? A lot of researchers think so. And one of the developed ideas is the distributed shared memory (DSM), a software layer on top of a distributed platform giving an abstract shared memory view to the applications. Even though it seems a good solution it also has some inconveniences. The memory coherence between the nodes in the platform is difficult to maintain (complex management, scalability issues, high overhead and others) and the latency of the remote-memory accesses which can be orders of magnitude greater than on a shared bus due to the interconnection network. Therefore this research improves the performance of OpenMP applications being executed on distributed memory platforms using a DSM with sequential consistency evaluating thoroughly the results from the NAS parallel benchmarks. The vast majority of designed DSMs use a relaxed consistency model because it avoids some major problems in the area. In contrast, we use a sequential consistency model because we think that showing these potential problems that otherwise are hidden may allow the finding of some solutions and, therefore, apply them to both models. The main idea behind this work is that both runtimes, the OpenMP and the DSM layer, should cooperate to achieve good performance, otherwise they interfere one each other trashing the final performance of applications. We develop three different contributions to improve the performance of these applications: (a) a technique to avoid false sharing at runtime, (b) a technique to mimic the MPI behaviour, where produced data is forwarded to their consumers and, finally, (c) a mechanism to avoid the network congestion due to the DSM coherence messages. The NAS Parallel Benchmarks are used to test the contributions. The results of this work shows that the false-sharing problem is a relative problem depending on each application. Another result is the importance to move the data flow outside of the critical path and to use techniques that forwards data as early as possible, similar to MPI, benefits the final application performance. Additionally, this data movement is usually concentrated at single points and affects the application performance due to the limited bandwidth of the network. Therefore it is necessary to provide mechanisms that allows the distribution of this data through the computation time using an otherwise idle network. Finally, results shows that the proposed contributions improve the performance of OpenMP applications on this kind of environments.
25

CDPthread: A POSIX-Thread Based Distributed Computing Environment

Tseng, Guo-Fu 28 July 2009 (has links)
Due to the limitation of single machine¡¦s computing power, and the aspect of cost, distributed design is getting more and more popular nowadays. The Distributed Shared Memory (DSM) system is one of the most hot topics in this area. Most people are dedicated on designing a library or even a new language, in order to gain higher performance on DSM systems. As a consequence, the programmers are required to learn a new library or language. Even more, they have to handle synchronizations for the distributed environment. In this paper, we propose a design that is compatible with POSIX-Thread Environment. The distributed nature of the system described herein is totally transparent to the programmers.
26

Υλοποίηση μεταφέρσιμου συστήματος κατανεμημένης κοινής μνήμης / Implementation of portable distributed shared memory

Καραντάσης, Κωνσταντίνος 01 August 2007 (has links)
Η ανάπτυξη και εγκατάσταση συστάδων υπολογιστών (clusters) και διαδικτυακών πλεγμάτων υπολογισμού (computational grids), διαρκώς αυξανόμενη στις μέρες μας, διαμορφώνει ένα σαφώς κατανεμημένο περιβάλλον, ικανό για την εφαρμογή υπολογισμού στο εύρος του διαδικτύου. Στο πλαίσιο αυτό, η παράλληλη επεξεργασία καλείται να επωφεληθεί από την εγγύτητα των υπολογιστικών πόρων, όπως αυτή διαμορφώνεται από τα σύγχρονα δίκτυα υψηλών ταχυτήτων. Την ίδια στιγμή, οι προγραμματιστές παράλληλων εφαρμογών βρίσκονται σε δίλημμα ανάμεσα σε μοντέλα προγραμματισμού κοινής μνήμης ή κατανεμημένα. Με τα κατανεμημένα μοντέλα να αποτελούν την αρχική και πιο φυσική επιλογή στο περιβάλλον των συστάδων και των πλεγμάτων, ο προγραμματιστής έρχεται ξανά αντιμέτωπος με τα διαχρονικά προβλήματα που ενέχει η αποτύπωση του παραλληλισμού των εφαρμογών και ο προγραμματισμός με τη χρήση μοντέλων ανταλλαγής μηνυμάτων (message passing models). Έχοντας σαν στόχο την απαλλαγή του προγραμματιστή από τις δυσκολίες των κατανεμημένων μοντέλων, γίνεται σημαντική ερευνητική προσπάθεια για την υλοποίηση συστημάτων και εργαλείων που θα μπορέσουν να παρέχουν ένα αξιόπιστο περιβάλλον προγραμματισμού κοινής μνήμης, επιτυγχάνοντας ταυτόχρονα συγκρίσιμη απόδοση με τα αντίστοιχα μοντέλα ανταλλαγής μηνυμάτων. Ωστόσο, ένα από τα βασικά χαρακτηριστικά των σύγχρονων περιβαλλόντων υπολογισμού, που δυσχεραίνει την μεταφορά της υπάρχουσας τεχνολογίας συστημάτων κατανεμημένης κοινής μνήμης από τις συστάδες υπολογιστών στα πλέγματα, είναι η εκτεταμένη ετερογένεια που παρατηρείται στα συστήματα που συμμετέχουν σε ένα υπολογιστικό πλέγμα. Συμμετέχοντας στην προσπάθεια πρότασης ενός εύχρηστου και αποδοτικού περιβάλλοντος προγραμματισμού, καταρχάς σε συστάδες υπολογιστών και με την προοπτική επέκτασης σε υπολογιστικά πλέγματα, στο πλαίσιο της συγκεκριμένης μεταπτυχιακής εργασίας υλοποιείται το σύστημα Pleiad. To Pleiad αποτελεί ολοκληρωμένο πρωτότυπο της αφαίρεσης κατανεμημένης κοινής μνήμης σε επίπεδο λογισμικού (Software Distributed Shared Memory - SDSM). Κύριος στόχος, δεδομένης της ετερογένειας των σύγχρονων παράλληλων συστημάτων, είναι τόσο η μεταφερσιμότητα όσο και η διαλειτουργικότητα του συστήματος και γι' αυτό το λόγο επιλέγεται για την υλοποίηση του η πλατφόρμα Java. Το σύστημα Pleiad είναι σε θέση να αξιοποιήσει τη σύγχρονη τάση στα πολυεπεξεργαστικά συστήματα, όπως αυτή καθορίζεται από την ευρεία διάθεση επεξεργαστών πολλαπλών πυρήνων, επιτρέποντας την εκτέλεση πολυνηματικών εφαρμογών στο εύρος του κατανεμημένου συστήματος. Επιπλέον η υλοποίηση λαμβάνει χώρα σε επίπεδο χρήστη (user-level), προσδίδοντας στο σύστημα μεγαλύτερη ευελιξία στο περιβάλλον των ιδεατών οργανισμών (virtual organizations - VOs) που διαθέτουν συστάδες υπολογιστών στο πλαίσιο πλεγμάτων. Τα αποτελέσματα από την πειραματική σύγκριση του συστήματος Pleiad με συναφή συστήματα είναι ενθαρρυντικά. Σε κάθε περίπτωση το πρωτότυπο του συστήματος Pleiad όπως παρουσιάζεται στη μεταπτυχιακή εργασία, αποτελεί έργο υποδομής, με αρκετά ενδιαφέροντα ζητήματα ανοικτά στην προοπτική μελλοντικής ερευνητικής δραστηριότητας. / The development and the deployment of clusters and computational grids, continuously increasing in our times, clearly form a distributed environment that is able to conduct computation at the scale of the Internet. Under these circumstances, parallel processing is urged to utilize the proximity of the afforded computational resources as it is accomplished by the advancements on high speed networks. At the same time the parallel applications programmers are quite often up against a dilemma having to choose between shared memory or distributed memory programming models. While distributed memory programming models are the most typical choice in the field of clusters and grids, the programmer encounters well known obstacles during his effort to extract the parallelism of the application. Willing to release the programmer from the need to explicitly express parallelism through message passing orchestration, much research has been done to implement middleware that provides the abstraction of shared memory programming while at the same time achieves acceptable performance compared to other message passing models. Nevertheless, one of the most fundamental characteristics of the modern, distributed computing environments that encumbers porting the existing DSM technology from clusters to grids, is the broad heterogeneity of the afforded computing resources. Participating in the effort of providing a simple, robust and yet efficient programming environment, firstly designated for clusters with the intention support seamless parallel programming on top of grids, at the present thesis we present Pleiad. Pleiad consists our research prototype providing the abstraction of shared memory programming, implemented at the software level (Software Distributed Shared Memory - SDSM). Considering by default the heterogeneity of the contemporary parallel systems, we have defined as a target of the presented thesis to provide a simple portable and interoperable DSM system. That direction led us to choose Java as our development platform. Pleiad is also able to utilize the trend in modern multiprocessors as it is defined by the advent of multicore CPUs by enabling the execution of multithreaded applications on top of the distributed hardware architecture. Moreover, the implementation of Pleiad takes place at the user level, which is the most appropriate decision concerning the highly diverse environment of the virtual organizations that are formed as parts of a grid. The first results of the experimental evaluation of Pleiad compared to similar systems are emboldening. In any case the first prototype of Pleiad as it is presented in the current thesis provides the essential infrastructure that will be used to further address open issues concerning our research interests on the topic of distributed shared memory abstraction.
27

Parallel processing in power systems computation on a distributed memory message passing multicomputer /

Hong, Chao, January 2000 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2000. / Includes bibliographical references (leaves 160-169).
28

Techniques for collective physical memory ubiquity within networked clusters of virtual machines

Hines, Michael R. January 2009 (has links)
Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
29

SNIC-DSM: SmartNIC based DSM Infrastructure for Heterogeneous-ISA Machines

Ramesh, Hemanth 14 August 2023 (has links)
Heterogeneous computing is increasingly used in today's datacenters to meet the increasing computational demands of applications. Heterogeneous hardware typically includes CPUs, GPUs, ASICs, and FPGAs, among others. An important emerging trend is instructionset- architecture (ISA)-heterogeneity: high-end x86 servers with attached SmartNICs and SmartSSDs that incorporate general-purpose CPUs, typically of the RISC ISA family (e.g., ARM, RISC-V). To alleviate resource congestion on server computing nodes, application workloads can be scaled-out across server x86 CPUs and SmartNIC ARM CPUs using the distributed shared memory (DSM) abstraction. We present SNIC-DSM, a SmartNIC-based DSM infrastructure for heterogeneous ISA machines. SNIC-DSM implements a low-latency messaging layer, which enables inter-node communication across multi-ISA CPUs, and a DSM protocol processor that provides memory coherency among these nodes, both implemented in SmartNIC's FPGA logic. SNIC-DSM is reconfigurable and allows the implementation of different memory consistency protocols. Our experimental studies using compute-intensive benchmarks reveal that SNIC-DSM outperforms the state-of-the-art DSM - Popcorn Linux's software DSM - when server resource congestion is high. / Master of Science / The availability of heterogeneous computing architectures has led to the development of distributed shared memory systems, which allows compute-intensive applications to run in a distributed manner on different types of computing devices such as graphics processors, reconfigurable logic devices, and custom integrated circuits. Adopting such a heterogeneous computing strategy yields better performance and improves power consumption. Generally, these DSM systems use a software-based approach, which offers great flexibility but suffers from software overheads. Hardware-based approaches are used to overcome these limitations but they generally do not offer flexibility. This thesis presents, SNIC-DSM, which is a reconfigurable implementation of the DSM framework. SNIC-DSM provides a platform for the host and smart networking devices such as SmartNICs to communicate with each other and enables application execution in a distributed manner by providing memory coherency. Our experimental evaluation using High-Performance Computing benchmarks reveals that SNIC-DSM improves performance when compared with software-based DSM.
30

DSM64: A Distributed Shared Memory System in User-Space

Holsapple, Stephen Alan 01 May 2012 (has links) (PDF)
This paper presents DSM64: a lazy release consistent software distributed shared memory (SDSM) system built entirely in user-space. The DSM64 system is capable of executing threaded applications implemented with pthreads on a cluster of networked machines without any modifications to the target application. The DSM64 system features a centralized memory manager [1] built atop Hoard [2, 3]: a fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. In my presentation, I present a SDSM system written in C++ for Linux operating systems. I discuss a straight-forward approach to implement SDSM systems in a Linux environment using system-provided tools and concepts avail- able entirely in user-space. I show that the SDSM system presented in this paper is capable of resolving page faults over a local area network in as little as 2 milliseconds. In my analysis, I present the following. I compare the performance characteristics of a matrix multiplication benchmark using various memory coherency models. I demonstrate that matrix multiplication benchmark using a LRC model performs orders of magnitude quicker than the same application using a stricter coherency model. I show the effect of coherency model on memory access patterns and memory contention. I compare the effects of different locking strategies on execution speed and memory access patterns. Lastly, I provide a comparison of the DSM64 system to a non-networked version using a system-provided allocator.

Page generated in 0.0843 seconds