Global ETD Search

31	Relaxing Concurrency Control in Transactional Memory Aydonat, Utku 05 January 2012 (has links) Transactional memory (TM) systems have gained considerable popularity in the last decade driven by the increased demand for tools that ease parallel programming. TM eliminates the need for user-locks that protect accesses to shared data. It offers performance close to that of fine-grain locking with the programming simplicity of coarse-grain locking. Today’s TM systems implement the two-phase-locking (2PL) algorithm which aborts transactions every time a conflict occurs. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in applications with high contention because it increases the rate of aborts. We propose the use of a more relaxed concurrency control algorithm to provide better concurrency. This algorithm is based on the conflict-serializability (CS) model. Unlike 2PL, it allows some transactions to commit successfully even when they make conflicting accesses. We implement this algorithm both in a software TM system as well as in a simulator of a hardware TM system. Our evaluation using TM benchmarks shows that the algorithm improves the performance of applications with long transactions and high abort rates. Performance is improved by up to 299% in the software TM, and up to 66% in the hardware simulator. We argue that these improvements come with little additional complexity and require no changes to the transactional programming model. This makes our implementation feasible parallel programming multi threaded programs transactional memory synchronization serializability concurrency control 0544 0984
32	Software Transactional Memory Building Blocks Riegel, Torvald 15 August 2013 (has links) (PDF) Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks. I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC. Transaktionaler Speicher Synchronisierung Transactional Memory Shared-memory synchronization Concurrent algorithms ddc:004 rvk:ST 175 rvk:ST 151
33	Architectural support for high-performing hardware transactional memory systems Lupon Navazo, Marc 23 December 2011 (has links) Parallel programming presents an efficient solution to exploit future multicore processors. Unfortunately, traditional programming models depend on programmer’s skills for synchronizing concurrent threads, which makes the development of parallel software a hard and errorprone task. In addition to this, current synchronization techniques serialize the execution of those critical sections that conflict in shared memory and thus limit the scalability of multithreaded applications. Transactional Memory (TM) has emerged as a promising programming model that solves the trade-off between high performance and ease of use. In TM, the system is in charge of scheduling transactions (atomic blocks of instructions) and guaranteeing that they are executed in isolation, which simplifies writing parallel code and, at the same time, enables high concurrency when atomic regions access different data. Among all forms of TM environments, Hardware TM (HTM) systems is the only one that offers fast execution at the cost of adding dedicated logic in the processor. Existing HTMsystems suffer considerable delays when they execute complex transactional workloads, especially when they deal with large and contending transactions because they lack adaptability. Furthermore, most HTM implementations are ad hoc and require cumbersome hardware structures to be effective, which complicates the feasibility of the design. This thesis makes several contributions in the design and analysis of low-cost HTMsystems that yield good performance for any kind of TM program. Our first contribution, FASTM, introduces a novel mechanism to elegantly manage speculative (and already validated) versions of transactional data by slightly modifying on-chip memory engine. This approach permits fast recovery when a transaction that fits in private caches is discarded. At the same time, it keeps non-speculative values in software, which allows in-place x memory updates. Thus, FASTM is not hurt from capacity issues nor slows down when it has to undo transactional modifications. Our second contribution includes two different HTM systems that integrate deferred resolution of conflicts in a conventional multicore processor, which reduces the complexity of the system with respect to previous proposals. The first one, FUSETM, combines different-mode transactions under a unified infrastructure to gracefully handle resource overflow. As a result, FUSETM brings fast transactional computation without requiring additional hardware nor extra communication at the end of speculative execution. The second one, SPECTM, introduces a two-level data versioning mechanism to resolve conflicts in a speculative fashion even in the case of overflow. Our third and last contribution presents a couple of truly flexible HTM systems that can dynamically adapt their underlying mechanisms according to the characteristics of the program. DYNTM records statistics of previously executed transactions to select the best-suited strategy each time a new instance of a transaction starts. SWAPTM takes a different approach: it tracks information of the current transactional instance to change its priority level at runtime. Both alternatives obtain great performance over existing proposals that employ fixed transactional policies, especially in applications with phase changes. Hardware transactional memory Parallel programming Dynamically adaptive systems Non-blocring synchronization FASTM DYNTM SWPTM 208 004
34	ReserveTM: Optimizing for Eager Software Transactional Memory Jain, Gaurav January 2013 (has links) Software Transactional Memory (STM) helps programmers write correct concurrent code by allowing them to identify atomic sections rather than focusing on the mechanics of concurrency control. Given code with atomic sections, the compiler and STM runtime can work together to ensure proper controlled access to shared memory. STM runtimes use either lazy or eager version management. Lazy versioning buffers transaction updates, whereas eager versioning applies updates in-place. The current set of primitives suit lazy versioning since memory needs to be accessed through the runtime. We present a new set of runtime primitives that better suit eager versioned STM. We propose a novel extension to the compiler/runtime interface, consisting of memory reservations and memory releases. These extensions enable optimizations specific to eager versioned runtimes. A memory reservation allows a transaction to perform instrumentation-free access on a memory address. A release allows a read-only address to be modified by another transaction. Together, these reduce the instrumentation overhead required to support STM and improve concurrency between readers and writers. We have implemented these primitives and evaluated its performance on the STAMP benchmarks. Our results show strong performance and scalability improvements to eager versioned algorithms. Software Transactional Memory Memory reservations memory releases privatization eager versioning TM API design Electrical and Computer Engineering
35	Efficient Conditional Synchronization for Transactional Memory Based System Naik, Aniket Dilip 10 April 2006 (has links) Multi-threaded applications are needed to realize the full potential of new chip-multi-threaded machines. Such applications are very difficult to program and orchestrate correctly, and transactional memory has been proposed as a way of alleviating some of the programming difficulties. However, transactional memory can directly be applied only to critical sections, while conditional synchronization remains difficult to implement correctly and efficiently. This dissertation describes EasySync, a simple and inexpensive extension to transactional memory that allows arbitrary conditional synchronization to be expressed in a simple and composable way. Transactional memory eliminates the need to use locks and provides composability for critical sections: atomicity of a transaction is guaranteed regardless of how other code is written. EasySync provides the same benefits for conditional synchronizations: it eliminates the need to use conditional variables, and it guarantees wakeup of the waiting transaction when the real condition it is waiting for is satisfied, regardless of whether other code correctly signals that change. EasySync also allows transactional memory systems to efficiently provide lock-free and condition variable-free conditional critical regions and even more advanced synchronization primitives, such as guarded execution with arbitrary conditional or guard code. Because EasySync informs the hardware the that a thread is waiting, it allows simple and effective optimizations, such as stopping the execution of a thread until there is a change in the condition it is waiting for. Like transactional memory, EasySync is backward compatible with existing code, which we confirm by running unmodified Splash-2 applications linked with an EasySync-based synchronization library. We also re-write some of the synchronization in three Splash-2 applications, to take advantage of better code readability, and to replace spin-waiting with its more efficient EasySync equivalents. Our experimental evaluation shows that EasySync successfully eliminates processor activity while waiting, reducing the number of executed instructions by 8.6% on average in a 16-processor CMP. We also show that these savings increase with the number of processors, and also for applications written for transactional memory systems. Finally, EasySync imposes virtually no performance overheads, and can in fact improve performance. Parallel programming Transactional memory Synchronization Transaction systems (Computer systems) Parallel programming (Computer science) Threads (Computer programs)
36	On the design of architecture-aware algorithms for emerging applications Kang, Seunghwa 30 January 2011 (has links) This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures. MapReduce Nested parallelism Parallel algorithm Algorithm engineering Performance tuning GPU Transactional memory Algorithms Parallel algorithms Multiprocessors
37	On algorithm design and programming model for multi-threaded computing He, Zhengyu 27 March 2012 (has links) The objective of this work is to investigate the algorithm design and the programming model of multi-threaded computing. Designing multi-threaded algorithms is very challenging - when multiple threads need to communicate or coordinate with each other, efficient synchronization support is needed. However, synchronizations are known to be expensive on the emerging multi-/many-core processors, especially when the number of threads increases. To fully unleash the power of such processors, carefully investigations are needed in both algorithm design and programming models for multi-threaded systems. In this dissertation, we first present an asynchronous multi-threaded algorithm for the maximum network flow problem. This algorithm is based on the classical push-relabel algorithm and completely removes the use of locks and barriers from its original parallel version. While this algorithmic method shows effectiveness, it is challenging to generalize the success to other multi-threaded problem. We next focus on improving the transactional memory, a promising mechanism to construct multi-threaded programs. A queuing-theory-based model is developed to analyze the performance of different transactional memory systems. Based on the results of the model, we emphasize on the contention management mechanism of transactional memory systems. A profiling-based adaptive contention management scheme is finally proposed to cope with the problem that none of the static contention management schemes can keep good performance on all platforms for all types of workload. From this research, we show that it is necessary and worthwhile to explore both the algorithm design aspect and the programming model aspect for multi-thread computing. Programming model Parallel computing Transactional memory Maximum flow Computer science Computer algorithms Algorithms Multiprocessors
38	Transactional memory concurrency : new models and systems Ramadan, Hany E. 21 March 2011 (has links) Transactional memory (TM) aims to bring the benefits of ACID transactions to the volatile world of program synchronization. Architectural trends are making software transactions more appealing, as more programmers struggle with the problems of locks as they exploit multi-core processors. This thesis applies TM, which until recently has been restricted to small benchmarks, to a large, real-life system: the Linux operating system kernel. I describe TxLinux, a version of Linux, which is the first OS to use transactional memory for synchronization. TxLinux runs on MetaTM, a simulator co-designed with TxLinux, which models an x86-based Hardware Transactional Memory (HTM) system. The TxLinux/MetaTM effort yields a characterization of real-life OS transactions, exposes previously unconsidered complications (including interaction with interrupts and stack memory) and allows sensitivity studies of various TM microarchitectural parameters. It also provides a flexible platform for future OS, TM and architecture research. Next, I examine ways to increase concurrency by investigating the factors that inhibit concurrency in existing TM models and systems. These include avoidable implementation limitations, overly restrictive serialization models, and inexpressive APIs. After examining the nature of each limitation, I propose a solution for each one. I postulate that the conventional wisdom that every transaction is "for itself" and primarily relates to other transactions by conflicting with them, is a pervasive misperception. This thesis aims to demonstrate that there are other ways of thinking about the relation of one transaction to another. I present three different transaction models to show how (i) co-existence, (ii) cooperation, and (iii) coordination, can each solve important problems facing TM programmers today. Co-existence of multiple transactions on the same processor is enabled using the suspended transactions model. This model, used by TxLinux, can reduce aborts and removes transaction length limitations imposed by interrupts. Cooperation of transactions that access the same data, using the dependence-aware transactions model, can transparently turn transaction aborts into commits. Drawing on serializability theory and notions of spheres of control (which predate ACID transactions), this model is able to accept more execution schedules than any existing TM design. Lastly, the coordination of multiple transactions in the coordinated sibling transactions model, provides programmers a simple and unified way of expressing intratransaction parallelism. This helps move transactions beyond being a drop-in replacement for locks (SLE-style) to instead helping programmers find more parallel work within their programs (both in speculative and non-speculative forms). All three models aim at increasing concurrency, while shifting complexity away from the programmer and into the TM system. I evaluate all three models, using either the MetaTM HTM, or one of the several software (STM) systems this thesis also develops. / text Transactional memory ACID transactions Program synchronization Linux TxLinux MetaTM Transaction models
39	Compiler Support for Fine-grain Software-only Checkpointing Zhao, Cheng Yan 13 August 2013 (has links) Checkpointing support allows program execution to roll-back to an earlier program point, discarding any modifications made since that point. Existing software-based checkpointing methods are mainly libraries that snapshot all of working-memory, and hence have prohibitive overhead for many potential applications. In this thesis we present a light-weight, fine-grain checkpointing framework implemented entirely in software through compiler transformations and optimizations. A programmer can specify arbitrary checkpoint regions via a simple API, and the compiler automatically transforms the code to implement the checkpoint at the granularity of individual stores, optimizing to remove redundancy. We explore three application areas for this support. First, we investigate its application to debugging, in particular by providing the ability to rewind to an arbitrarily-placed point in a buggy program’s execution. A study using BugBench applications shows that our compiler-based approach is more than 100X less overhead than full-process checkpointing. Second, we demonstrate that compiler-based checkpointing support can be leveraged to free the programmer from manually implementing and maintaining software rollback mechanisms when coding a backtracking algorithm, with runtime overhead of only 15% compared to the manual implementation. Finally, we utilize the efficient software-only checkpointing to support overlapping execution with delinquent loads through the demonstrations both control-speculation and data-speculation transformations. We further propose a theoretical speculative timing model and confirm its prediction effectiveness with real-machine workload. checkpoinitng compiler program analysis optimization speculation parallelism programming transactional memory 0984
40	Compiler Support for Fine-grain Software-only Checkpointing Zhao, Cheng Yan 13 August 2013 (has links) Checkpointing support allows program execution to roll-back to an earlier program point, discarding any modifications made since that point. Existing software-based checkpointing methods are mainly libraries that snapshot all of working-memory, and hence have prohibitive overhead for many potential applications. In this thesis we present a light-weight, fine-grain checkpointing framework implemented entirely in software through compiler transformations and optimizations. A programmer can specify arbitrary checkpoint regions via a simple API, and the compiler automatically transforms the code to implement the checkpoint at the granularity of individual stores, optimizing to remove redundancy. We explore three application areas for this support. First, we investigate its application to debugging, in particular by providing the ability to rewind to an arbitrarily-placed point in a buggy program’s execution. A study using BugBench applications shows that our compiler-based approach is more than 100X less overhead than full-process checkpointing. Second, we demonstrate that compiler-based checkpointing support can be leveraged to free the programmer from manually implementing and maintaining software rollback mechanisms when coding a backtracking algorithm, with runtime overhead of only 15% compared to the manual implementation. Finally, we utilize the efficient software-only checkpointing to support overlapping execution with delinquent loads through the demonstrations both control-speculation and data-speculation transformations. We further propose a theoretical speculative timing model and confirm its prediction effectiveness with real-machine workload. checkpoinitng compiler program analysis optimization speculation parallelism programming transactional memory 0984

Search results