Global ETD Search

11	Studying and analysing transactional memory using interval temporal logic and AnaTempura El-kustaban, Amin Mohammed Ahmed January 2012 (has links) Transactional memory (TM) is a promising lock-free synchronisation technique which offers a high-level abstract parallel programming model for future chip multiprocessor (CMP) systems. Moreover, it adapts the well-established popular paradigm of transactions and thus provides a general and flexible way to allow programs to read and modify disparate memory locations atomically as a single operation. In this thesis, we propose a general framework for validating a TM design, starting from a formal specification into a hardware implementation, with its underpinning theory and refinement. A methodology in this work starts with a high-level and executable specification model for an abstract TM with verification for various correctness conditions of concurrent transactions. This model is constructed within a flexible transition framework that allows verifying correctness of a TM system with animation. Then, we present a formal executable specification for a chip-dual single-cycle MIPS processor with a cache coherence protocol and integrate the provable TM system. Finally, we transform the dual processors with the TM from a high-level description into a Hardware Description Language (VHDL), using some proposed refinement and restriction rules. Interval Temporal Logic (ITL) and its programming language subset AnaTempura are used to build, execute and test the model, since they together provide a powerful framework supporting logical reasoning about time intervals as well as programming and simulation. 005.1
12	Refactoring for Software Transactional Memory Baum, Mark Vincent 27 February 2012 (has links) Software transactional memory (STM) is an optimistic concurrent lock free mechanism that has the potential of positively transforming how concurrent programming is performed. STM, despite its many desirable attributes, is not yet a ubiquitous programming language feature in the commercial software domain. There are many implementation challenges with retrofitting STM into pre-existing language frameworks. Furthermore, existing software systems will also need to be refactored in order to take advantage of STM’s unique benefits. As with other time consuming and error prone refactoring processes, refactoring for STM is best done with automated tool support; it is the aim of this paper to propose such a tool. / text Refactoring Automated tool supported refactoring Software Transactional Memory
13	Hardware transactional memory : a systems perspective Rossbach, Christopher John 22 March 2011 (has links) The increasing ubiquity of chip multiprocessor machines has made the need for accessible approaches to parallel programming all the more urgent. The current state of the art, based on threads and locks, requires the programmer to use mutual exclusion to protect shared resources, enforce invariants, and maintain consistency constraints. Despite decades of research effort, this approach remains fraught with difficulty. Lock-based programming is complex and error-prone, largely due to well-known problems such as deadlock, priority inversion, and poor composability. Tradeoffs between performance and complexity for locks remain unattractive. Coarse-grain locking is simple but introduces artificial sharing, needless serialization, and yields poor performance. Fine-grain locking can address these issues, but at a significant cost in complexity and maintainability. Transactional memory has emerged as a technology with the potential to address this need for better parallel programming tools. Transactions provide the abstraction of isolated, atomic execution of critical sections. The programmer specifies regions of code which access shared data, and the system is responsible for executing that code in a way that is isolated and atomic. The programmer need not reason about locks and threads. Transactional memory removes many of the pitfalls of locking: transactions are livelock- and deadlock-free and may be composed freely. Hardware transactional memory, which is the focus of this thesis, provides an efficient implementation of the TM abstraction. This thesis explores several key aspects of supporting hardware transactional memory (HTM): operating systems support and integration, architectural, design, and implementation considerations, and programmer-transparent techniques to improve HTM performance in the presence of contention. Using and supporting HTM in an OS requires innovation in both the OS and the architecture, but enables practical approaches and solutions to some long-standing OS problems. Innovations in transactional cache coherence protocols enable HTM support in the presence of multi-level cache hierarchies, rich HTM semantics such as suspend/resume and multiple transactions per thread context, and can provide the building blocks for support of flexible contention management policies without the need to trap to software handlers. We demonstrate a programmer-transparent hardware technique for using dependences between transactions to commit conflicting transactions, and suggest techniques to allow conflicting transactions to avoid performance-sapping restarts without using heuristics such as backoff. Both mechanisms yield better performance for workloads that have significant write-sharing. Finally, in the context of the MetaTM HTM model, this thesis contributes a high-fidelity cross-design comparison of representative proposals from the literature: the result is a comprehensive exploration of the HTM design space that compares the behavior of models of MetaTM (70, 75), LogTM (58, 94), and Sun's Rock (22). / text Hardware transactional memory Operating systems MetaTM TxLinux Dependence aware transactional memory TagTM XMESI protocol Cache
14	Performance Optimizations for Software Transactional Memory January 2011 (has links) The transition from single-core processors to multi-core processors demands a change from sequential programming to concurrent programming for mainstream programmers. However, concurrent programming has long been widely recognized as being notoriously difficult. A major reason for its difficulty is that existing concurrent programming constructs provide low-level programming abstractions. Using these constructs forces programmers to consider many low level details. Locks, the dominant programming construct for mutual exclusion, suffer several well known problems, such as deadlock, priority inversion, and convoying, and are directly related to the difficulty of concurrent programming. The alternative to locks, i.e. non-blocking programming, not only is extremely error-prone, but also does not produce consistently good performance. Better programming constructs are critical to reduce the complexity of concurrent programming, increase productivity, and expose the computing power in multi-core processors. Transactional memory has emerged recently as one promising programming construct for supporting atomic operations on shared data. By eliminating the need to consider a huge number of possible interactions among concurrent transactions, Transactional memory greatly reduces the complexity of concurrent programming and vastly improves programming productivity. Software transactional memory systems implement a transactional memory abstraction in software. Unfortunately, existing designs of Software Transactional Memory systems incur significant performance overhead that could potentially prevent it from being widely used. Reducing STM's overhead will be critical for mainstream programmers to improve productivity while not suffering performance degradation. My thesis is that the performance of STM can be significantly improved by intelligently designing validation and commit protocols, by designing the time base, and by incorporating application-specific knowledge. I present four novel techniques for improving performance of STM systems to support my thesis. First, I propose a time-based STM system based on a runtime tuning strategy that is able to deliver performance equal to or better than existing strategies. Second, I present several novel commit phase designs and evaluate their performance. Then I propose a new STM programming interface extension that enables transaction optimizations using fast shared memory reads while maintaining transaction composability. Next, I present a distributed time base design that outperforms existing time base designs for certain types of STM applications. Finally, I propose a novel programming construct to support multi-place isolation. Experimental results show the techniques presented here can significantly improve the STM performance. We expect these techniques to help STM be accepted by more programmers. Applied sciences Transactional memory Parallel computing Concurrent programming Computer science
15	Orchestration and atomicity Kitchin, David Wilson 11 September 2013 (has links) This dissertation presents the concurrent programming language Ora, an extension of the Orc orchestration language with the capability to execute transactions. A new formal definition of transactions is given, in terms of two complementary properties: atomicity and coatomicity. These properties are described in terms of a partial order of events, rather than as properties of a totally ordered program trace. Atomicity and coatomicity are ensured in Ora programs by a novel algorithm for multiversion concurrency control. / text Orchestration Concurrency Transactions Transactional memory Atomicity Coatomicity Orc Ora MVCC
16	Compiler Transformations For Improving The Performance Of Software Transactional Memory Mannarswamy, Sandya S 11 1900 (has links) (PDF) Expressing synchronization using traditional lock based primitives has been found to be both error-prone and restrictive. Hence there has been considerable research work to develop scalable and programmer-friendly alternatives to lock-based synchronization. Atomic sections have been proposed as a programming idiom for expressing synchronization at a higher-level of abstraction than locks. One way of supporting atomic sections in software is to rely on an underlying Software Transactional Memory (STM) implementation. While STM offers the promise of being a programming paradigm which is less error-prone and more programmer friendly compared to traditional lock-based synchronization, it also needs to be competitive in performance in order for it to be adopted in mainstream software. Unfortunately STMs do not meet the performance goals and are known to incur excessive performance overheads. Prior work by other researchers and our performance analysis of STM applications show that conflicts and the resulting aborts are a major performance bottleneck for STM applications. Second we find that, supporting fine-grained optimistic concurrency can have significant impact on the cache behavior of applications running on STM and hence can adversely affect STM performance. Our systematic quantitative analysis of the cache behavior of STM applications as well as prior work on qualitative analysis of STM overheads show that cache overheads constitute a major performance bottleneck for STM applications. Hence in this thesis, we focus on addressing these two major STM performance bottlenecks. Current STM implementations are typically application unaware in that they do not analyze the application and use that knowledge to improve the application performance on STM. Closer integration of transactions with the programming languages opens up the possibility of using the compiler to analyze STM applications and using that knowledge to perform application code transformations to improve the application performance on STM automatically and in a manner transparent to the programmer. This motivated us to address the two major STM performance bottlenecks namely poor cache performance and performance penalty due to aborts, by compiler transformations. In order to pinpoint the cache bottlenecks of STM, we perform a detailed experimental evaluation of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We propose a set of compiler transformations targeted to address the cache performance bottlenecks identified by our analysis. Next we turn our attention to compiler analysis and transformations targeted at reducing the performance overheads due to transactional aborts, effectively utilizing the compiler’s knowledge of the data access patterns of the application. Since not all applications are designed with optimistic concurrency in mind, real world applications typically contain certain atomic sections which are not amenable to STM’s optimistic concurrency control and hence suffer from excessive transactional abort overheads. We propose two compiler techniques for handling such atomic sections. Another major cause of transactional conflicts leading to unnecessary aborts is the uniform granularity access tracking scheme employed by STM implementations. Using a single uniform access tracking granularity leads to poor lock assignment by STM. We propose techniques which use compiler’s knowledge of an application to improve the application unaware lock assignment made by the STM. Last as transactional abort overheads impact STM performance adversely, we propose a compiler-based approach to reduce the transactional abort overheads by reconciling certain kinds of transactions instead of aborting them and then performing a complete re-execution. We show that our combined set of compiler transformations are effective in improving the performance of a set of STAMP benchmarks by reducing the execution time by 7.48% to 54.82%, aborts by 8.98% to 56.61% and the average D-cache miss latency by up to 33.51%. Compiler Transformation Software Transactional Memory (STM) Computer Science
17	Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations El-Shambakey, Mohammed Talat 02 October 2013 (has links) Lock-based concurrency control suffers from programmability, scalability, and composability challenges. These challenges are exacerbated in emerging multicore architectures, on which improved software performance must be achieved by exposing greater concurrency. Transactional memory (TM) is an emerging alternative synchronization model for shared memory objects that promises to alleviate these difficulties. In this dissertation, we consider software transactional memory (STM) for concurrency control in multicore real-time software, and present a suite of real-time STM contention managers for resolving transactional conflicts. The contention managers are called ECM, RCM, LCM, PNF, and FBLT. RCM and ECM resolve conflicts using fixed and dynamic priorities of real-time tasks, respectively, and are naturally intended to be used with the fixed priority (e.g., G-RMA) and dynamic priority (e.g., G-EDF) multicore real-time schedulers, respectively. LCM resolves conflicts based on task priorities as well as atomic section lengths, and can be used with G-EDF or G-RMA schedulers. Transactions under ECM, RCM, and LCM may retry due to conflicts with higher priority tasks even when there are no shared objects, i.e., transitive retry. PNF avoids transitive retry and optimizes processor usage by lowering the priority of retrying transactions, thereby enabling other non-conflicting transactions to proceed. PNF, however, requires a priori knowledge of all requested objects for each atomic section, which is inconsistent with the semantics of dynamic STM. Moreover, its centralized design increases overhead. FBLT avoids transitive retry, do not require a priori knowledge of requested objects, and has a decentralized design. We establish upper bounds on transactional retry costs and task response times under the contention managers through schedulability analysis. Since ECM and RCM preserve the semantics of the underlying real-time scheduler, their maximum transactional retry cost is double the maximum atomic section length. This is improved in the design of LCM, which achieves shorter retry costs and tighter upper bounds. As PNF avoids transitive retry and improves processor usage, it yields shorter retry costs and tighter upper bounds than ECM, RCM, and LCM. FBLT\'s upper bounds are similarly tight because it combines the advantages of PNF and LCM. We formally compare the proposed contention managers with each other, with lock-free synchronization, and with multiprocessor real-time locking protocols. Our analysis reveals that, for most cases, ECM, RCM, and LCM achieve higher schedulability than lock-free synchronization only when the atomic section length does not exceed half of lock-free synchronization\'s retry loop length. With equal periods and greater access times for shared objects, atomic section length under ECM, RCM, and LCM can be much larger than the retry loop length while still achieving better schedulability. With proper values for LCM\'s design parameters, atomic section length can be larger than the retry loop length for better schedulability. Under PNF, atomic section length can exceed lock-free\'s retry loop length and still achieve better schedulability in certain cases. FBLT achieves equal or better schedulability than lock-free with appropriate values for design parameters. The schedulability advantage of the contention managers over multiprocessor real-time locking protocols such as Global OMLP and RNLP depends upon the value of $s_{max}/L_{max}$, the ratio of the maximum transaction length to the maximum critical section length. FBLT\'s schedulability is equal or better than Global OMLP and RNLP if $s_/L_ le 2$. Checkpointing enables partial roll-back of transactions by recording transaction execution states (i.e., checkpoints) during execution, allowing roll-back to a previous checkpoint instead of transaction start, improving task response time. We extend FBLT with checkpointing and develop CP-FBLT, and identify the conditions under which CP-FBLT achieves equal or better schedulability than FBLT. We implement the contention managers in the Rochester STM framework and conduct experimental studies using a multicore real-time Linux kernel. Our studies reveal that among the contention managers, CP-FBLT has the best average-case performance. CP-FBLT\'s higher performance is due to the fact that PNF\'s and LCM\'s advantages are combined into the design of FBLT, which is the base of CP-FBLT. Moreover, checkpointing improves task response time. The contention managers were also found to have equal or better average-case performance than lock-free synchronization: more jobs meet their deadlines using CP-FBLT, FBLT, and PNF than lock-free synchronization by 34.6%, 28.5%, and 32.4% (on average), respectively. The superiority of the contention managers is directly due to their better conflict resolution policies. Locking protocols such as OMLP and RNLP were found to perform better: more jobs meet their deadlines under OMLP and RNLP than any contention manager by 12.4% and 13.7% (on average), respectively. However, the proposed contention managers have numerous qualitative advantages over locking protocols. Locks do not compose, whereas STM transactions do. To allow multiple objects to be accessed in a critical section, OMLP assigns objects to non-conflicting groups, where each group is protected by a distinct lock. RNLP assumes that objects are accessed in a specific order to prevent deadlocks. In contrast, STM allows multiple objects to be accessed in a transaction in any order, while guaranteeing deadlock-freedom, which significantly increases programmability. Moreover, STM offers platform independence: the proposed contention managers can be entirely implemented in the user-space as a library. In contrast, real-time locking protocols such as OMLP and RNLP must be supported by the underlying platform (i.e., operating system or virtual machine). / Ph. D. Software Transactional Memory Embedded Systems Contention Managers real-time systems
18	HyflowCPP: A Distributed Software Transactional Memory Framework for C++ Mishra, Sudhanshu 13 February 2013 (has links) The distributed transactional memory (DTM) abstraction aims to simplify the development of distributed concurrent programs. It frees programmers from the complicated and error-prone task of explicit concurrency control based on locks (e.g., deadlocks, livelocks, non-scalability, non-composability), which are aggravated in a distributed environment due to the complexity of multi-node concurrency. At its core, DTM's atomic section-based synchronization abstraction enables the execution of a sequence of multi-node object operations with the classical serializability property, which significantly increases the programmability of distributed systems. In this thesis, we present the first ever DTM framework for distributed concurrency control in C++, called HyflowCPP. HyflowCPP provides distributed atomic sections, and pluggable support for concurrency control algorithms, directory lookup protocols, contention management policies, and network communication protocols. The framework uses the Transaction Forwarding Algorithm (or TFA) for concurrency control. While there exists implementations of TFA and other DTM concurrency control algorithms in Scala and Java, and concomitant DTM frameworks (e.g., HyflowJava, HyflowScala, D2STM, GenRSTM), HyflowCPP provides a uniquely distinguishing TFA/DTM implementation for C++. Also, HyflowCPP supports strong atomicity, transactional nesting models including closed and open nesting (supported using modifications to TFA), and checkpointing. We evaluated HyflowCPP through an experimental study that measured transactional throughput for a set of micro- and macro-benchmarks, and comparing with competitor DTM frameworks. Our results revealed that HyflowCPP achieves up to 600% performance improvement over competitor Java DTM frameworks including D2STM, GenRSTM, HyflowScala and HyflowJava, which can be attributed to the competitors' JVM overhead and rudimentary networking support. Additionally, our experimental studies revealed that checkpointing achieves up to 100% performance improvement over flat nesting and 50% over closed nesting. Open nesting model achieves up to 140% performance improvement over flat nesting and 90% over closed nesting. / Master of Science Distributed Transactional Memory Transactional Framework C++ Distributed Concurrency
19	On Partial Aborts and Reducing Validation Costs in Fault-tolerant Distributed Transactional Memory Dhoke, Aditya Anil 30 September 2013 (has links) Distributed Transactional Memory (DTM) is an emerging synchronization abstraction thatpromises to alleviate the scalability, programmability, and composability challenges of lock-based distributed synchronization. With DTM, programmers organize code that read/writeshared memory objects, both local and remote, as memory transactions. An underlying DTMframework guarantees atomicity, isolation, and consistency properties for those transactionsthrough speculative concurrency control. In DTM, restarting an aborted transaction from the beginning can degrade performance astransactional conflicts may have occurred in the later part of the transaction, wasting work.The partial abort method alleviates this difficulty by enabling a transaction to be rolled backto the point where objects in the transaction's read-set and write-set are still consistent.In this thesis, we present protocols for supporting partial aborts in QR-DTM, a fault-tolerant DTM that uses quorum protocols for distributed concurrency control in the presence offailures. We focus on two partial abort models: closed nesting, which allows transactions to be nested and inner transactions to be rolled back without rolling back outer transactions;and checkpointing, which allows the transaction state to be saved in checkpoints throughout execution and transactions to be rolled back to those checkpoints. We present protocols that support these partial abort models in QR-DTM, called QR-CN and QR-CHK. We implemented these protocols and conducted experimental studies using macro-benchmarks(e.g., distributed versions of STAMP benchmark), and micro-benchmarks (e.g., distributed data structures). Our studies reveal that QR-CN improves throughput by as much as 101% over flat nesting in specific cases, with an average improvement of 53%. We also develop QR-ACN, a framework that automatically decomposes programmer-writtentransactions into closed nested transactions, at run-time, to improve performance. The composition of a closed nested transaction depends on the contention of objects, which can change at run-time depending upon the workload at hand. Our implementation and experimental studies reveal that QR-ACN consistently outperforms flat nesting by an averageof 51% on benchmarks including TPC-C. False conflicts occur when high-level operations, even though semantically independent, traverse the same set of objects during transaction execution. Such conflicts can lead torepeated aborts, increasing transaction execution time and degrading performance, which can be significant in DTM, since transaction execution also includes network communication. ii We consider the approach of reducing validation cost for resolving false conflicts. We presentthree protocols for reducing validation cost in DTM. Our first protocol, QR-ON, incorporatesthe open nesting model into QR-DTM. Open nesting allows inner-nested transactions tocommit independently of their parent transaction, releasing objects in the transaction read-set and write-set early, minimizing aborts due to false conflicts. We then present QR-OON,in which open-nested transactions commit asynchronously so that subsequent transactionscan proceed without waiting for the commit of previous transactions. Finally, we presentan early release methodology, QR-ER, in which objects that do not affect the final state ofthe shared data are dropped from the transaction's read-set, which avoids false conflicts andreduces communication costs. Our implementation and experimental studies revealed that QR-OON outperforms QR-ON by up to 43%, and that QR-ER outperforms QR-ON and QR-OON by up to 10% on micro- and macro-benchmarks. / Master of Science Software Transactional Memory Partial Abort Validation Closed Nesting Checkpointing
20	ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mahmoud Mohamedin, Mohamed Ahmed 21 March 2012 (has links) As chip vendors are increasingly manufacturing a new generation of multi-processor chips called multicores, improving software performance requires exposing greater concurrency in software. Since code that must be run sequentially is often due to the need for synchronization, the synchronization abstraction has a significant effect on program performance. Lock-based synchronization — the most widely used synchronization method — suffers from programability, scalability, and composability challenges. Transactional memory (TM) is an emerging synchronization abstraction that promises to alleviate the difficulties with lock-based synchronization. With TM, code that read/write shared memory objects is organized as transactions, which speculatively execute. When two transactions conflict (e.g., read/write, write/write), one of them is aborted, while the other commits, yielding (the illusion of) atomicity. Aborted transactions are re-started, after rolling-back changes made to objects. In addition to a simple programming model, TM provides performance comparable to lock-based synchronization. Software transactional memory (STM) implements TM entirely in software, without any special hardware support, and is usually implemented as a library, or supported by a compiler or by a virtual machine. In this thesis, we present ByteSTM, a virtual machine-level Java STM implementation. ByteSTM implements two STM algorithms, TL2 and RingSTM, and transparently supports implicit transactions. Program bytecode is automatically modified to support transactions: memory load/store bytecode instructions automatically switch to transactional mode when a transaction starts, and switch back to normal mode when the transaction successfully commits. Being implemented at the VM-level, it accesses memory directly and uses absolute memory addresses to uniformly handle memory. Moreover, it avoids Java garbage collection (which has a negative impact on STM performance), by manually allocating and recycling memory for transactional metadata. ByteSTM uses field-based granularity, and uses the thread header to store transactional metadata, instead of the slower Java ThreadLocal abstraction. We conducted experimental studies comparing ByteSTM with other state-of-the-art Java STMs including Deuce, ObjectFabric, Multiverse, DSTM2, and JVSTM on a set of micro- benchmarks and macro-benchmarks. Our results reveal that, ByteSTM's transactional throughput improvement over competitors ranges from 20% to 75% on micro-benchmarks and from 36% to 100% on macro-benchmarks. / Master of Science Synchronization Virtual Machine Java Multiprocessor Concurrency Software Transactional Memory

Search results