Global ETD Search

11	Βελτιστοποίηση και εφαρμογή πρωτοκόλλων συνέπειας μνήμης για SDSM συστήματα Χριστοπούλου, Άρτεμις – Μαρίνα 07 August 2007 (has links) Επίσημα η πρώτη κατηγοριοποίηση υπολογιστικών συστημάτων παράλληλων αρχιτεκτονικών ή διαφορετικά, αρχιτεκτονικών υπολογιστικών συστημάτων με πολλαπλά επεξεργαστικά στοιχεία έγινε το 1972 από τον M. J. Flynn. Στη συνέχεια υπήρξαν δύο ακόμα προσπάθειες ταξινόμησης των υπολογιστικών συστημάτων με παράλληλες αρχιτεκτονικές. Η μια έγινε το 1983 από τους A. Bode και W. Händler, και βασίζεται στον αριθμό των Μονάδων Ελέγχου και των Επεξεργαστικών Στοιχείων που υπάρχουν στα υπολογιστικά συστήματα παράλληλων αρχιτεκτονικών. Τέλος η τρίτη προσπάθεια έγινε το 1999 από τους Volkert και D. Kranzlmüller και βασίζεται στον τρόπου που διασυνδέεται η κύρια μνήμη με τα επεξεργαστικά στοιχεία. Εμείς στην παρούσα εργασία μας θα χρησιμοποιήσουμε την τρίτη προσέγγιση ταξινόμησης των παράλληλων υπολογιστικών συστημάτων. Δηλαδή αυτή που πραγματοποιείται με βάση τον τρόπο υλοποίησης της μνήμης σε αρχιτεκτονικές κατανεμημένης (distributed) και κοινής (shared) μνήμης. Στις αρχιτεκτονικές κατανεμημένης μνήμης ο κάθε επεξεργαστής έχει τη δική του ιδιωτική (τοπική) μνήμη στην οποία διατηρεί αποκλειστική πρόσβαση, ενώ η μεταβίβαση δεδομένων μεταξύ των επεξεργαστών πραγματοποιείται με πέρασμα μηνυμάτων (message passing). Ως αποτέλεσμα, τα αντίστοιχα συστήματα κλιμακώνονται ευκολότερα σε υπολογιστική ισχύ, αλλά παράλληλα είναι πολυπλοκότερη η ανάπτυξη εφαρμογών για αυτά, καθότι απαιτείται από τον προγραμματιστή να λάβει εξειδικευμένη μέριμνα για τον καταμερισμό των δεδομένων εισόδου και τη μεταβίβασή τους στον αντίστοιχο επεξεργαστή. Με σκοπό την απόκρυψη της παραπάνω πολυπλοκότητας σχεδιάστηκαν και υλοποιήθηκαν τα «συστήματα κατανεμημένης κοινής μνήμης υλοποιημένα με λογισμικό» (SDSM). Τα σχετικά πακέτα λογισμικού επιτρέπουν την ανάπτυξη και εκτέλεση εφαρμογών με χρήση του μοντέλου κοινής μνήμης σε αρχιτεκτονικές κατανεμημένης μνήμης: όποτε ένας επεξεργαστής ζητά δεδομένα της κοινής μνήμης, το SDSM λογισμικό προσδιορίζει αν αυτά είναι διαθέσιμα στην τοπική του μνήμη και, εφόσον χρειάζεται, φροντίζει για την ανάκτησή τους χωρίς προγραμματιστική παρέμβαση. Επιπλέον, όταν ένας επεξεργαστής τροποποιεί δεδομένα της «κοινής» μνήμης το λογισμικό φροντίζει για την ενημέρωση των υπολοίπων επεξεργαστών. Οι αλγόριθμοι ανάκτησης δεδομένων της κοινής μνήμης και ενημέρωσης των υπολοίπων επεξεργαστών για αλλαγές σε αυτά αποτελούν το πρωτόκολλο συνέπειας μνήμης. Η αποτελεσματικότητα του πρωτοκόλλου αυτού, με βάση ορισμένες μετρικές, αποτελεί τον κυρίαρχο παράγοντα που καθορίζει τις επιδόσεις του αντίστοιχου SDSM συστήματος. Στόχο της παρούσας διπλωματικής αποτελεί: 1. Η συγκριτική μελέτη και αξιολόγηση των πρωτοκόλλων μνήμης που βρίσκονται σήμερα σε χρήση από τα διάφορα S-DSM συστήματα 2. Η σχεδίαση και υλοποίηση βελτιωμένων πρωτόκολλων συνέπειας μνήμης με αυξημένες επιδόσεις 3. Η τεκμηρίωση της επιτευγμένης βελτιστοποίησης μέσω της πραγματοποίησης δοκιμών σε πραγματικά συστήματα με πλήθος εφαρμογών / The first official classification of parallel architecture systems or systems with multiple computing elements is considered the one of M. J. Flynn, formulated in 1972. Beyond that, there were two noteworthy attempts towards classifying parallel systems. The first one is owed to A. Bode and W. Handler, who selected the number of control units and computing elements as their basic classification criterion for parallel systems in 1983. The second attempt by Volkert and D. Kranzlmüller categorized parallel systems on the basis of the computing element and main memory interlink. This thesis focuses on this latest approach, categorizing the parallel systems to distributed memory and shared memory ones. In the case of distributed shared memory systems each processor has its own private (local) main memory, for which it retains exclusive access, while all data transfer is carried out through message passing. This allows for easier scaling of distributed memory systems, however it introduces complications for the application developer, since he has to take into account the division of the input data as well as take provision for their transmission to the proper computing element. “Software Distributed Shared Memory” (S-DSM) systems were designed and implemented in order to offer an abstraction layer that would hide the aforementioned complexities. The corresponding software packages allow for the development of applications using the shared memory model on distributed memory architectures; whenever a computing element requests data from the “shared memory” the SDSM software identifies whether they are available in the local memory and takes care for their retrieval, without the need for special programming intervention. In addition, whenever a processor modifies “shared memory” data, the SDSM updates the other computing elements. The algorithms for retrieving shared memory data and updating the remaining processors for changes are the “cache coherence protocol”. The efficiency of this protocol, calculated using predefined metrics, is the paramount factor in determining an SDSM system’s performance. The main focus of this thesis is: 4. A comparative study and evaluation of the memory protocols in use today by the various S-DSM systems 5. The design and implementation of improved cache coherence protocols with improved performance 6. The verification of the achieved optimization through measurements carried out in real systems using a number of applications 004.35
12	Integrating dynamic memory placement with adaptive load-balancing for parallel codes on NUMA multiprocessors Slavin, Paul January 2008 (has links) Parallel scientific programs executing in a NUMA environment are confronted with the problem of how to place their data in the system's physically separate memories so as to minimise the latency of accesses to this data made by the program's threads. Motivated by this poor performance, this thesis describes a technique by which the partition of a parallel program's workload that is created by a loadbalancing routine may be used to identify the affinities of the threads of this program for regions of the program's address space. 004.35
13	Networks-on-chip for multiprocessors Lu, Ye January 2012 (has links) It is anticipated that with further transistor dimension scaling as well as packaging innovation, the transistor budget will keep increasing in the next decade. However, as the benefits of transistor scaling decrease, little performance improvement and reduction in switching energy can be archived by the scaling. Bounded by slow memory and increased power, the performance of microprocessor reaches the point of diminishing return. The main focus of this thesis is to develop an effective on-chip communication architecture and processor architecture for next generation multiprocessor on a single hard wired silicon chip, or a programmable chip such as FPGA, allowing the physical property and performance of the architecture to scale with the ever increasing transistor budget offered by new technology node. A Networks-on-Chip (NoC) centric system design has been presented. This explores the main attraction of the scalability of NoC, which contrasts traditional multiprocessor interconnects, such as shared bus and crossbar, which are not scalable in terms of their performance and cost. In addition, a novel multithreading processor architecture has been developed with fast with fast single cycle thread scheduling that is capable of allocating the computation resources to the threads doing computational work rather than the threads waiting for data communication. 004.35
14	Investigating the Scalability of Tiled Chip Multiprocessors using Multiple Networks Preethi, Sam January 2009 (has links) The era of billion and more transistors on a single silicon chip has already begun and this has changed the direction of future computing towards building chip multiprocessors (CMP) systems. Nevertheless the challenges of maintaining cache coherency as well providing scalability on CMPs is still in its initial stages of development. This thesis therefore investigates the scalability of cache coherent CMP systems. 004.35
15	Software-orientated system design for field programmable gate arrays Self, R. P. January 2004 (has links) No description available. 004.35
16	Fast local hardware processing for analysis of large databases Cabal-Yepez, E. January 2007 (has links) No description available. 004.35
17	Parallelism and the software-hardware interface in embedded systems Chouliaras, V. A. January 2005 (has links) This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality. 004.35
18	An FPGA coprocessor for large matrix decompositions Ahmedsaid, A. January 2006 (has links) No description available. 004.35
19	Analysis of liveness through proof Wei, Kun January 2006 (has links) Over the past decade, formal methods have been remarkably successful in their application to the analysis of concurrent systems. The vast bulk of work to date has been concerned only with safety properties and liveness properties however have not yet been mastered to the same degree. Broadly speaking, approaches to analysis of concurrent systems have fallen into two camps: model checking and theorem proving. Although both of them are rather successful, they suffer from their own deficiencies-that is, model checking is superior in checking a finite system automatically; theorem proving however can reason about systems with massive or infinite state spaces. In the thesis we present an embedding of the stable failures model of CSP in the theorem prover PVS. Our work, extending Dutertre and Schneider's traces embedding in PVS, provides a platform for the formal verification not only of safety specifications, but also of liveness specifications of concurrent systems in theorem provers. Such a platform is particularly good at analysing infinite-state systems with an arbitrary number of components. We have demonstrated the power of the CSP embedding by using it to construct formal proofs that the asymmetric dining philosophers problem with an arbitrary number of philosophers is deterministic and deadlock-free, and that an industrial-scale example, a 'virtual network', with any number of nodes, is deadlock-free. Also we use such an embedding to prove the correctness of the fairness property of the Zhou-Gollmann protocol. We discuss a potential way to widen the applicability of formal methods, along with developing a tool to automatically transform PVS scripts into FDR scripts, in order to unite the automatic nature of model checking and the generality of theorem proving. 004.35
20	Type systems for coordination languages Cai, Haoxuan January 2012 (has links) S-Net is a declarative coordination language rooted in stream processing with a runtime that automatically distributes the computational units among available resources. It is conceived in response to the change of processor development trend, from making the speed faster to embedding more cores. For performance reasons, the S-Net compiler is responsible for generating some additional information, through type inference, which is used by the runtime to make data delivery decisions. This requires the compiler to be supported by a sound type system which can ensure that the program behaviour meets the expectations of the language designers and the programmers. However, due to S-Net's design principle of ease of use, the S-Net type system was believed to be simple and was only informally documented. As we empirically tested the type inference implementation, we gradually revealed the hidden complexity of the calculus behind the apparently easy-to-use language, which was clearly beyond the capability of the informal type system. We then attempted several formulations of the type system, each addressing more issues we have found, but a complete solution was still missing. S-Net now urgently needs a formal type system with proofs of soundness. We have identified a major issue which has been making it difficult to design a correct type system, that is the type-semantics interdependency. In this thesis, we present a new design of the S-Net semantics and type system with no type-semantics interdependency, in terms of a new language BL-Net, a reduced S-Net which preserves only the type-related behaviour, which has an operational semantics reflecting that of S-Net, and a type system with the soundness and completeness proof. Our contributions also include a bridging solution to fit the new type system into the existing compiler structure. 004.35

Search results