Global ETD Search

11	ReserveTM: Optimizing for Eager Software Transactional Memory Jain, Gaurav January 2013 (has links) Software Transactional Memory (STM) helps programmers write correct concurrent code by allowing them to identify atomic sections rather than focusing on the mechanics of concurrency control. Given code with atomic sections, the compiler and STM runtime can work together to ensure proper controlled access to shared memory. STM runtimes use either lazy or eager version management. Lazy versioning buffers transaction updates, whereas eager versioning applies updates in-place. The current set of primitives suit lazy versioning since memory needs to be accessed through the runtime. We present a new set of runtime primitives that better suit eager versioned STM. We propose a novel extension to the compiler/runtime interface, consisting of memory reservations and memory releases. These extensions enable optimizations specific to eager versioned runtimes. A memory reservation allows a transaction to perform instrumentation-free access on a memory address. A release allows a read-only address to be modified by another transaction. Together, these reduce the instrumentation overhead required to support STM and improve concurrency between readers and writers. We have implemented these primitives and evaluated its performance on the STAMP benchmarks. Our results show strong performance and scalability improvements to eager versioned algorithms. Software Transactional Memory Memory reservations memory releases privatization eager versioning TM API design Electrical and Computer Engineering
12	ReserveTM: Optimizing for Eager Software Transactional Memory Jain, Gaurav January 2013 (has links) Software Transactional Memory (STM) helps programmers write correct concurrent code by allowing them to identify atomic sections rather than focusing on the mechanics of concurrency control. Given code with atomic sections, the compiler and STM runtime can work together to ensure proper controlled access to shared memory. STM runtimes use either lazy or eager version management. Lazy versioning buffers transaction updates, whereas eager versioning applies updates in-place. The current set of primitives suit lazy versioning since memory needs to be accessed through the runtime. We present a new set of runtime primitives that better suit eager versioned STM. We propose a novel extension to the compiler/runtime interface, consisting of memory reservations and memory releases. These extensions enable optimizations specific to eager versioned runtimes. A memory reservation allows a transaction to perform instrumentation-free access on a memory address. A release allows a read-only address to be modified by another transaction. Together, these reduce the instrumentation overhead required to support STM and improve concurrency between readers and writers. We have implemented these primitives and evaluated its performance on the STAMP benchmarks. Our results show strong performance and scalability improvements to eager versioned algorithms. Software Transactional Memory Memory reservations memory releases privatization eager versioning TM API design Electrical and Computer Engineering
13	Optimisation de la performance des applications de mémoire transactionnelle sur des plates-formes multicoeurs : une approche basée sur l'apprentissage automatique / Improving the Performance of Transactional Memory Applications on Multicores : A Machine Learning-based Approach Castro, Márcio 03 December 2012 (has links) Le concept de processeur multicœurs constitue le facteur dominant pour offrir des hautes performances aux applications parallèles. Afin de développer des applications parallèles capable de tirer profit de ces plate-formes, les développeurs doivent prendre en compte plusieurs aspects, allant de l'architecture aux caractéristiques propres à l'application. Dans ce contexte, la Mémoire Transactionnelle (Transactional Memory – TM) apparaît comme une alternative intéressante à la synchronisation basée sur les verrous pour ces plates-formes. Elle permet aux programmeurs d'écrire du code parallèle encapsulé dans des transactions, offrant des garanties comme l'atomicité et l'isolement. Lors de l'exécution, les opérations sont exécutées spéculativement et les conflits sont résolus par ré-exécution des transactions en conflit. Bien que le modèle de TM ait pour but de simplifier la programmation concurrente, les meilleures performances ne pourront être obtenues que si l'exécutif est capable de s'adapter aux caractéristiques des applications et de la plate-forme. Les contributions de cette thèse concernent l'analyse et l'amélioration des performances des applications basées sur la Mémoire Transactionnelle Logicielle (Software Transactional Memory – STM) pour des plates-formes multicœurs. Dans un premier temps, nous montrons que le modèle de TM et ses performances sont difficiles à analyser. Pour s'attaquer à ce problème, nous proposons un mécanisme de traçage générique et portable qui permet de récupérer des événements spécifiques à la TM afin de mieux analyser les performances des applications. Par exemple, les données tracées peuvent être utilisées pour détecter si l'application présente des points de contention ou si cette contention est répartie sur toute l'exécution. Notre approche peut être utilisée sur différentes applications et systèmes STM sans modifier leurs codes sources. Ensuite, nous abordons l'amélioration des performances des applications sur des plate-formes multicœurs. Nous soulignons que le placement des threads (thread mapping) est très important et peut améliorer considérablement les performances globales obtenues. Pour faire face à la grande diversité des applications, des systèmes STM et des plates-formes, nous proposons une approche basée sur l'Apprentissage Automatique (Machine Learning) pour prédire automatiquement les stratégies de placement de threads appropriées pour les applications de TM. Au cours d'une phase d'apprentissage préliminaire, nous construisons les profiles des applications s'exécutant sur différents systèmes STM pour obtenir un prédicteur. Nous utilisons ensuite ce prédicteur pour placer les threads de façon statique ou dynamique dans un système STM récent. Finalement, nous effectuons une évaluation expérimentale et nous montrons que l'approche statique est suffisamment précise et améliore les performances d'un ensemble d'applications d'un maximum de 18%. En ce qui concerne l'approche dynamique, nous montrons que l'on peut détecter des changements de phase d'exécution des applications composées des diverses charges de travail, en prévoyant une stratégie de placement appropriée pour chaque phase. Sur ces applications, nous avons obtenu des améliorations de performances d'un maximum de 31% par rapport à la meilleure stratégie statique. / Multicore processors are now a mainstream approach to deliver higher performance to parallel applications. In order to develop efficient parallel applications for those platforms, developers must take care of several aspects, ranging from the architectural to the application level. In this context, Transactional Memory (TM) appears as a programmer friendly alternative to traditional lock-based concurrency for those platforms. It allows programmers to write parallel code as transactions, which are guaranteed to execute atomically and in isolation regardless of eventual data races. At runtime, transactions are executed speculatively and conflicts are solved by re-executing conflicting transactions. Although TM intends to simplify concurrent programming, the best performance can only be obtained if the underlying runtime system matches the application and platform characteristics. The contributions of this thesis concern the analysis and improvement of the performance of TM applications based on Software Transactional Memory (STM) on multicore platforms. Firstly, we show that the TM model makes the performance analysis of TM applications a daunting task. To tackle this problem, we propose a generic and portable tracing mechanism that gathers specific TM events, allowing us to better understand the performances obtained. The traced data can be used, for instance, to discover if the TM application presents points of contention or if the contention is spread out over the whole execution. Our tracing mechanism can be used with different TM applications and STM systems without any changes in their original source codes. Secondly, we address the performance improvement of TM applications on multicores. We point out that thread mapping is very important for TM applications and it can considerably improve the global performances achieved. To deal with the large diversity of TM applications, STM systems and multicore platforms, we propose an approach based on Machine Learning to automatically predict suitable thread mapping strategies for TM applications. During a prior learning phase, we profile several TM applications running on different STM systems to construct a predictor. We then use the predictor to perform static or dynamic thread mapping in a state-of-the-art STM system, making it transparent to the users. Finally, we perform an experimental evaluation and we show that the static approach is fairly accurate and can improve the performance of a set of TM applications by up to 18%. Concerning the dynamic approach, we show that it can detect different phase changes during the execution of TM applications composed of diverse workloads, predicting thread mappings adapted for each phase. On those applications, we achieve performance improvements of up to 31% in comparison to the best static strategy. Mémoire Transactionnelle Logicielle Apprentissage Automatique Traçage Placement de Threads Software Transactional Memory Machine Learning Tracing Thread Mapping
14	Performance Tradeoffs in Software Transactional Memory Abbas, Gulfam, Asif, Naveed January 2010 (has links) Transactional memory (TM), a new programming paradigm, is one of the latest approaches to write programs for next generation multicore and multiprocessor systems. TM is an alternative to lock-based programming. It is a promising solution to a hefty and mounting problem that programmers are facing in developing programs for Chip Multi-Processor (CMP) architectures by simplifying synchronization to shared data structures in a way that is scalable and compos-able. Software Transactional Memory (STM) a full software approach of TM systems can be defined as non-blocking synchronization mechanism where sequential objects are automatically converted into concurrent objects. In this thesis, we present performance comparison of four different STM implementations – RSTM of V. J. Marathe, et al., TL2 of D. Dice, et al., TinySTM of P. Felber, et al. and SwissTM of A. Dragojevic, et al. It helps us in deep understanding of potential tradeoffs involved. It further helps us in assessing, what are the design choices and configuration parameters that may provide better ways to build better and efficient STMs. In particular, suitability of an STM is analyzed against another STM. A literature study is carried out to sort out STM implementations for experimentation. An experiment is performed to measure performance tradeoffs between these STM implementations. The empirical evaluations done as part of this thesis conclude that SwissTM has significantly higher throughput than state-of-the-art STM implementations, namely RSTM, TL2, and TinySTM, as it outperforms consistently well while measuring execution time and aborts per commit parameters on STAMP benchmarks. The results taken in transaction retry rate measurements show that the performance of TL2 is better than RSTM, TinySTM and SwissTM. Multiprocessor Concurrent Programming Synchronization Software Transactional Memory Performance Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
15	Software lock elision for x86 machine code Roy, Amitabha January 2011 (has links) More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a 'niche' programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMPlibraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it. 005.3
16	Avaliação de desempenho do sistema de memória transacional de Clojure como biblioteca de sincronização na linguagem Java / Performance evaluation of Clojure transactional memory system as a synchronization library in Java language Pablo César Calcina Ccori 14 June 2011 (has links) Neste trabalho apresenta-se uma avaliação do desempenho da implementação de memória transacional da linguagem Clojure, utilizada como biblioteca de sincronização para uso em conjunto com outras aplicações dentro da máquina virtual de Java. É implementada uma camada de interface entre as estruturas de dados de Clojure e o benchmark STMBench7 e são discutidos alguns aspectos que geram sobrecarga no desempenho. / In this work a performance evaluation of Clojure transactional memory implementation is presented, using it as a synchronization library to work together with other applications on Java virtual machine. It is implemented an interface layer between Clojure data structures and STMBench7 benchmark, and issues about overhead in performance are discussed. clojure memória transactional em software stm clojure software transactional memory stm
17	Collaborative Scheduling and Synchronization of Distributable Real-Time Threads Fahmy, Sherif Fadel 17 June 2010 (has links) In this dissertation, we consider the problem of scheduling and synchronization of distributable real-time threads --- Real-Time CORBA's first-class abstraction for programming real-time, multi-node sequential behaviors. Distributable real-time threads can be scheduled, broadly, using two paradigms: node independent scheduling, in which nodes independently construct thread schedules, based on node-level decomposition of distributable thread (or DT) scheduling parameters, and collaborative scheduling, in which nodes collaborate to construct system-wide thread schedules, which may or may not involve scheduling parameter decomposition. While significant literature exists on node independent scheduling, little is known about collaborative scheduling and its concomitant tradeoffs. We design three collaborative scheduling algorithms, called ACUA, QBUA, and DQBUA. ACUA uses theory of consensus and QBUA uses theory of quorums for distributable thread schedule construction. DQBUA extends QBUA with lock-based, local and distributed concurrency control. The algorithms consider a model where distributable threads arrive arbitrarily, have time/utility function time constraints, access resources in an arbitrary way (e.g., arbitrary lock acquire/release order, arbitrary nestings), and are subject to arbitrary node crash failures and message losses. We analytically establish several properties of the algorithms including probabilistic end-to-end termination time satisfactions, timeliness optimality during underloads, bounded exception handling time, and correctness of the algorithms in partially synchronous systems. We implement distributable real-time threads in the Linux kernel as a first-class programming and scheduling abstraction. The resulting kernel, called ChronOS, provides application interfaces for creating and manipulating distributable threads, as well as kernel interfaces and mechanisms for scheduling them (using both independent and collaborative approaches). ChronOS also has failure detector mechanisms for detecting and recovering from distributable thread failures. We implement the proposed scheduling algorithms and their competitors in ChronOS and compare their behavior. Our studies reveal that the collaborative scheduling algorithms are superior to independent scheduling algorithms for certain thread sets, in particular, when thread sections have significantly varying execution time. This variability, especially if the variability is not consistent among the threads, may cause each node to make conflicting decisions in the absence of global information. We observe that collaborative schedulers outperform independent schedulers (e.g., EDF augmented with PIP) in terms of accrued utility by as much as 75%. We identify distributed dependencies as one of the major sources of overhead in collaborative scheduling. In particular, the cost of distributed lock-based concurrency control (e.g., lock management, distributed deadlock detection/resolution) can significantly reduce the problem space for which collaborative scheduling is beneficial. To mitigate this, we consider the use of software transactional memory (or STM), an optimistic, non-blocking synchronization alternative to lock-based concurrency control which has been extensively studied in non real-time contexts. We consider distributable real-time threads with STM concurrency control, and develop techniques for analyzing and bounding their end-to-end response times on distributed single-processor and distributed multiprocessor systems. We also develop contention management techniques, a key component of STM, which are driven by threads' real-time scheduling parameters, and establish their tradeoffs against non-real-time contention managers. / Ph. D. Time/Utility Functions Utility Accrual Scheduling Distributed Real-Time Scheduling Software Transactional Memory
18	Scheduling Memory Transactions in Distributed Systems Kim, Junwhan 15 October 2013 (has links) Distributed transactional memory (DTM) is an emerging, alternative concurrency control model that promises to alleviate the difficulties of lock-based distributed synchronization. In DTM, transactional conflicts are traditionally resolved by a contention manager. A complementary approach for handling conflicts is through a transactional scheduler, which orders transactional requests to avoid or minimize conflicts. We present a suite of transactional schedulers: Bi-interval, Commutative Requests First (CRF), Reactive Transactional Scheduler (RTS), Dependency-Aware Transactional Scheduler} (DATS), Scheduling-based Parallel Nesting} (SPN), Cluster-based Transactional Scheduler} (CTS), and Locality-aware Transactional Scheduler} (LTS). The schedulers consider Herlihy and Sun's dataflow execution model, where transactions are immobile and objects are migrated to invoking transactions, relying on directory-based cache-coherence protocols to locate and move objects. Within this execution model, the proposed schedulers target different DTM models. Bi-interval considers the single object copy DTM model, and categorizes concurrent requests into read and write intervals to maximize the concurrency of read transactions. This allows an object to be simultaneously sent to read transactions, improving transactional makespan. We show that Bi-interval improves the makespan competitive ratio of DTM without such a scheduler to O(log(N)) for the worst-case and (log(N - k) for the average-case, for N nodes and k read transactions. Our implementation reveals that Bi-interval enhances transactional throughput over the no-scheduler case by as much as 1.71x, on average. CRF considers multi-versioned DTM. Traditional multi-versioned TM models use multiple object versions to guarantee commits of read transactions, but limit concurrency of write transactions. CRF relies on the notion of commutative transactions, i.e., those that ensure consistency of the shared data-set even when they are validated and committed concurrently. CRF detects conflicts between commutative and non-commutative write transactions and then schedules them according to the execution state, enhancing the concurrency of write transactions. Our implementation shows that transactional throughput is improved by up to 5x over a state-of-the-art competitor (DecentSTM). RTS and DATS consider transactional nesting in DTM, and focus on the closed and open nesting models, respectively. RTS determines whether a conflicting outer transaction must be aborted or enqueued according to the level of contention. If a transaction is enqueued, its closed-nested transactions do not have to retrieve objects again, resulting in reduced communication delays. DATS's goal is to boost the throughput of open-nested transactions by reducing the overhead of running expensive compensating actions and acquiring/releasing abstract locks when the outer transaction aborts. The contribution of DATS is twofold. First, it allows commutable outer transactions to be validated concurrently and allows non-commutable outer transactions -- depending on their inner transactions -- to be committed before others without dependencies. Implementations reveal effectiveness: RTS and DATS improve throughput (over the no-scheduler case), by as much as 1.88x and 2.2x, respectively. SPN considers parallel nested transactions in DTM. The idea of parallel nesting is to execute the inner transactions that access different objects concurrently, and execute the inner transactions that access the same objects serially, increasing performance. However, the parallel nesting model may be ineffective if all inner transactions access the same object due to the additional overheads needed to identify both types of inner transactions. SPN avoids this overhead and allows inner transactions to request objects and to execute them in parallel. Implementations reveal that SPN outperforms non-parallel nesting (i.e., closed nesting) by up to 3.5x and 4.5x on a micro-benchmark (bank) and the TPC-C transactional benchmark, respectively. CTS considers the replicated DTM model: object replicas are distributed across clusters of nodes, where clusters are determined based on inter-node distance, to maximize locality and fault-tolerance, and to minimize memory usage and communication overhead. CTS enqueues transactions that are aborted due to early validation over clusters and assigns their backoff times, reducing communication overhead. Implementation reveals that CTS improves throughput over competitor replicated DTM solutions including GenRSTM and DecentSTM by as much as 1.64x, on average. LTS considers the genuine partial replicated DTM model. In this model, LTS exploits locality by: 1) employing a transaction scheduler, which enables/disables object ownership changes depending on workload fluctuations, and 2) splitting hot-spot objects into multiple replicas for reducing contention. Our implementation reveals that LTS outperforms state-of-the-art competitors (Score and CTS) by up to 2.6x on micro-benchmarks (Linked List and Skip List) and by up to 2.2x on TPC-C. / Ph. D. Software Transactional Memory Distributed Systems Transactional Scheduling Partial Replication (Parallel) Nested Transactions
19	Supporting Software Transactional Memory in Distributed Systems: Protocols for Cache-Coherence, Conflict Resolution and Replication Zhang, Bo 05 December 2011 (has links) Lock-based synchronization on multiprocessors is inherently non-scalable, non-composable, and error-prone. These problems are exacerbated in distributed systems due to an additional layer of complexity: multinode concurrency. Transactional memory (TM) is an emerging, alternative synchronization abstraction that promises to alleviate these difficulties. With the TM model, code that accesses shared memory objects are organized as transactions, which speculatively execute, while logging changes. If transactional conflicts are detected, one of the conflicting transaction is aborted and re-executed, while the other is allowed to commit, yielding the illusion of atomicity. TM for multiprocessors has been proposed in software (STM), in hardware (HTM), and in a combination (HyTM). This dissertation focuses on supporting the TM abstraction in distributed systems, i.e., distributed STM (or D-STM). We focus on three problem spaces: cache-coherence (CC), conflict resolution, and replication. We evaluate the performance of D-STM by measuring the competitive ratio of its makespan --- i.e., the ratio of its makespan (the last completion time for a given set of transactions) to the makespan of an optimal off-line clairvoyant scheduler. We show that the performance of D-STM for metric-space networks is O(N^2) for N transactions requesting an object under the Greedy contention manager and an arbitrary CC protocol. To improve the performance, we propose a class of location-aware CC protocols, called LAC protocols. We show that the combination of the Greedy manager and a LAC protocol yields an O(NlogN s) competitive ratio for s shared objects. We then formalize two classes of CC protocols: distributed queuing cache-coherence (DQCC) protocols and distributed priority queuing cache-coherence (DPQCC) protocols, both of which can be implemented using distributed queuing protocols. We show that a DQCC protocol is O(NlogD)-competitive and a DPQCC protocol is O(log D_delta)-competitive for N dynamically generated transactions requesting an object, where D_delta is the normalized diameter of the underlying distributed queuing protocol. Additionally, we propose a novel CC protocol, called Relay, which reduces the total number of aborts to O(N) for N conflicting transactions requesting an object, yielding a significantly improvement over past CC protocols which has O(N^2) total number of aborts. We also analyze Relay's dynamic competitive ratio in terms of the communication cost (for dynamically generated transactions), and show that Relay's dynamic competitive ratio is O(log D_0), where D_0 is the normalized diameter of the underlying network spanning tree. To reduce unnecessary aborts and increase concurrency for D-STM based on globally-consistent contention management policies, we propose the distributed dependency-aware (DDA) conflict resolution model, which adopts different conflict resolution strategies based on transaction types. In the DDA model, read-only transactions never abort by keeping a set of versions for each object. Each transaction only keeps precedence relations based on its local knowledge of precedence relations. We show that the DDA model ensures that 1) read-only transactions never abort, 2) every transaction eventually commits, 3) supports invisible reads, and 4) efficiently garbage collects useless object versions. To establish competitive ratio bounds for contention managers in D-STM, we model the distributed transactional contention management problem as the traveling salesman problem (TSP). We prove that for D-STM, any online, work conserving, deterministic contention manager provides an Omega(max[s,s^2/D]) competitive ratio in a network with normalized diameter D and s shared objects. Compared with the Omega(s) competitive ratio for multiprocessor STM, the performance guarantee for D-STM degrades by a factor proportional to s/D. We present a randomized algorithm, called Randomized, with a competitive ratio O(sClog n log ^{2} n) for s objects shared by n transactions, with a maximum conflicting degree C. To break this lower bound, we present a randomized algorithm Cutting, which needs partial information of transactions and an approximate TSP algorithm A with approximation ratio phi_A. We show that the average case competitive ratio of Cutting is O(s phi_A log^{2}m log^{2}n), which is close to O(s). Single copy (SC) D-STM keeps only one writable copy of each object, and thus cannot tolerate node failures. We propose a quorum-based replication (QR) D-STM model, which provides provable fault-tolerance without incurring high communication overhead, when compared with the SC model. The QR model stores object replicas in a tree quorum system, where two quorums intersect if one of them is a write quorum, and ensures the consistency among replicas at commit-time. The communication cost of an operation in the QR model is proportional to the communication cost from the requesting node to its closest read or write quorum. In the presence of node failures, the QR model exhibits high availability and degrades gracefully when the number of failed nodes increases, with reasonable higher communication cost. We develop a protoytpe implementation of the dissertation's proposed solutions, including DQCC and DPQCC protocols, Relay protocol, and the DDA model, in the HyFlow Java D-STM framework. We experimentally evaluated these solutions with respective competitor solutions on a set of microbenchmarks (e.g., data structures including distributed linked list, binary search tree and red-black tree) and macrobenchmarks (e.g., distributed versions of the applications in the STAMP STM benchmark suite for multiprocessors). Our experimental studies revealed that: 1) based on the same distributed queuing protocol (i.e., Ballistic CC protocol), DPQCC yields better transactional throughput than DQCC, by a factor of 50% - 100%, on a range of transactional workloads; 2) Relay outperforms competitor protocols (including Arrow, Ballistic and Home) by more than 200% when the network size and contention increase, as it efficiently reduces the average aborts per transaction (less than 0.5); 3) the DDA model outperforms existing contention management policies (including Greedy, Karma and Kindergarten managers) by upto 30%-40% in high contention environments; For read/write-balanced workloads, the DDA model outperforms these contention management policies by 30%-60% on average; for read-dominated workloads, the model outperforms by over 200%. / Ph. D. Cache-Coherence Distributed Queuing Contention Management Software Transactional Memory Replication Quorum System
20	HyFlow: A High Performance Distributed Software Transactional Memory Framework Saad Ibrahim, Mohamed Mohamed 14 June 2011 (has links) We present HyFlow - a distributed software transactional memory (D-STM) framework for distributed concurrency control. Lock-based concurrency control suffers from drawbacks including deadlocks, livelocks, and scalability and composability challenges. These problems are exacerbated in distributed systems due to their distributed versions which are more complex to cope with (e.g., distributed deadlocks). STM and D-STM are promising alternatives to lock-based and distributed lock-based concurrency control for centralized and distributed systems, respectively, that overcome these difficulties. HyFlow is a Java framework for DSTM, with pluggable support for directory lookup protocols, transactional synchronization and recovery mechanisms, contention management policies, cache coherence protocols, and network communication protocols. HyFlow exports a simple distributed programming model that excludes locks: using (Java 5) annotations, atomic sections are defiend as transactions, in which reads and writes to shared, local and remote objects appear to take effect instantaneously. No changes are needed to the underlying virtual machine or compiler. We describe HyFlow's architecture and implementation, and report on experimental studies comparing HyFlow against competing models including Java remote method invocation (RMI) with mutual exclusion and read/write locks, distributed shared memory (DSM), and directory-based D-STM. / Master of Science Directory Protocols Cache Coherence Contention Management Control-Flow Dataflow Software Transactional Memory Distributed Systems

Search results