Global ETD Search

1	Synchronization costs in parallel programs and concurrent data structures / Coûts de synchronization dans les programmes parallèlles et les structures de données simultanées Aksenov, Vitalii 26 September 2018 (has links) Pour utiliser la puissance de calcul des ordinateurs modernes, nous devons écrire des programmes concurrents. L’écriture de programme concurrent efficace est notoirement difficile, principalement en raison de la nécessité de gérer les coûts de synchronisation. Dans cette thèse, nous nous concentrons sur les coûts de synchronisation dans les programmes parallèles et les structures de données concurrentes.D’abord, nous présentons une nouvelle technique de contrôle de la granularité pour les programmes parallèles conçus pour un environnement de multi-threading dynamique. Ensuite, dans le contexte des structures de données concurrentes, nous considérons la notion d’optimalité de concurrence (concurrency-optimality) et proposons la première implémentation concurrence-optimal d’un arbre binaire de recherche qui, intuitivement, accepte un ordonnancement concurrent si et seulement si l’ordonnancement est correct. Nous proposons aussi la combinaison parallèle (parallel combining), une technique qui permet l’implémentation efficace des structures de données concurrences à partir de leur version parallèle par lots. Nous validons les techniques proposées par une évaluation expérimentale, qui montre des performances supérieures ou comparables à celles des algorithmes de l’état de l’art.Dans une perspective plus formelle, nous considérons le phénomène d’assistance (helping) dans des structures de données concurrentes. On observe un phénomène d’assistance quand l’ordre d’une opération d’un processus dans une trace linéarisée est fixée par une étape d’un autre processus. Nous montrons qu’aucune implémentation sans attente (wait-free) linéarisable d’une pile utilisant les primitives read, write, compare&swap et fetch&add ne peut être “sans assistance” (help-free), corrigeant une erreur dans une preuve antérieure de Censor-Hillel et al. Finalement, nous proposons une façon simple de prédire analytiquement le débit (throughput) des structures de données basées sur des verrous à gros grains. / To use the computational power of modern computing machines, we have to deal with concurrent programs. Writing efficient concurrent programs is notoriously difficult, primarily due to the need of harnessing synchronization costs. In this thesis, we focus on synchronization costs in parallel programs and concurrent data structures.First, we present a novel granularity control technique for parallel programs designed for the dynamic multithreading environment. Then in the context of concurrent data structures, we consider the notion of concurrency-optimality and propose the first implementation of a concurrency-optimal binary search tree that, intuitively, accepts a concurrent schedule if and only if the schedule is correct. Also, we propose parallel combining, a technique that enables efficient implementations of concurrent data structures from their parallel batched counterparts. We validate the proposed techniques via experimental evaluations showing superior or comparable performance with respect to state-of-the-art algorithms.From a more formal perspective, we consider the phenomenon of helping in concurrent data structures. Intuitively, helping is observed when the order of some operation in a linearization is fixed by a step of another process. We show that no wait-free linearizable implementation of stack using read, write, compare&swap and fetch&add primitives can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al. Finally, we propose a simple way to analytically predict the throughput of data structures based on coarse-grained locking Structures de données simultanées Contrôle de la granularité Programme concurrent Concurrent data structures Granularity control Concurrent programs
2	Advances in High Performance Computing Through Concurrent Data Structures and Predictive Scheduling Lamar, Kenneth M 01 January 2024 (has links) (PDF) Modern High Performance Computing (HPC) systems are made up of thousands of server-grade compute nodes linked through a high-speed network interconnect. Each node has tens or even hundreds of CPU cores each, with counts continuing to grow on newer HPC clusters. This results in a need to make use of millions of cores per cluster. Fully leveraging these resources is difficult. There is an active need to design software that scales and fully utilizes the hardware. In this dissertation, we address this gap with a dual approach, considering both intra-node (single node) and inter-node (across node) concerns. To aid in intra-node performance, we propose two novel concurrent data structures: a transactional vector and a persistent hash map. These designs have broad applicability in any multi-core environment but are particularly useful in HPC, which commonly features many cores per node. For inter-node performance, we propose a metrics-driven approach to improve scheduling quality, using predicted run times to backfill jobs more accurately and aggressively. This is augmented using application input parameters to further improve these run time predictions. Improved scheduling reduces the number of idle nodes in an HPC cluster, maximizing job throughput. We find that our data structures outperform the prior state-of-the-art while offering additional features. Our backfill technique likewise outperforms previous approaches in simulations, and our run time predictions were significantly more accurate than conventional approaches. Code for these works is freely available, and we have plans to deploy these techniques more broadly on real HPC systems in the future. concurrent data structures high performance computing transactions persistence backfill scheduling run time prediction
3	A JVM-Managed Concurrent Unrolled List-Based Set Using Lazy Synchronization / Samverkande utrullade listbaserade set som använder lat synkronisering i JVM Farhadi, Adam January 2021 (has links) The multicore revolution of the early 21st century has introduced a multitude of multiprocessor synchronization techniques for designing concurrent data structures. This thesis explores the concept of “unrolling”, or storing multiple data items per node, in order to increase the concurrent throughput of linked-lists, or more specifically list-based sets of linked nodes. Our contribution is a concurrent unrolled list-based set implemented in the Java programming language which uses lock-based lazy synchronization with memory management handled by the Java Virtual Machine (JVM). Our unrolled list implementation provides decreased traversal overhead over the state-of-the-art concurrent linked lists under limited contention, along with improved spatial locality due to unrolled node data items being stored in sequential memory locations. Our results show that in comparison to the state-of-the- art lock-based and lock-free concurrent list-based set implementations in Java, our concurrent unrolled list-based set provides between 1.7x to 12.2x increased concurrent throughput under medium contention, and 25.7x to 196.5x increased concurrent throughput under low contention, depending on the ratio of write to read operations. Furthermore, we show that unrolling a concurrent linked-list can provide at least 60% of the performance of the Java concurrency package’s native lock-free skiplist, namely ConcurrentSkipListSet. / Den revolutionerande utvecklingen av flerkärniga processorer under början av 2000-talet har medfört många nya designprinciper för samverkande datastrukturer. Den här avhandlingen behandlar utrullings-konceptet eller hur man lagrar multipla dataenheter per nod för att kunna öka genomströmningen av länkade listor, eller mer specifikt listor bestående av länkade noder. Vårt bidrag består av samverkande list-baserade set implementerade i programmeringsspråket Java som använder låsbaserad lat synkronisering där JVM (Java Virtual Machine) hanterar minnet. Vår implementering av utrullade listor ger minskad tvärgående overhead över toppmoderna samverkande länkade listor med reducerad kapacitetsbegränsning, och med förbättrad plats eftersom utrullade datanoder sparas på sekventiella minnesplatser. Våra resultat visar att, i jämförelse med andra toppmoderna låsta och låsfria samverkande listbaserade set i Java, ger vår implementation mellan 1.7 till 12.2 gånger mer genomströmning under normal kapacitetsbegränsning och med 25.7 till 196.5 under låg kapacitetsbegränsning, beroende på graden av skriv- och läsprocesser. Vidare visar vi att en utrullad samverkande länkad lista kan ge minst 60% bättre prestanda i Javas paket för låsfria samverkande listor kallad ConcurrentSkipListSet. Concurrent Data Structures Linked-Lists List-Based Sets Samverkande datastrukturer länkade listor listbaserade set Computer Sciences Datavetenskap (datalogi)
4	Designing, Modeling, and Optimizing Transactional Data Structures Hassan, Ahmed Mohamed Elsayed 25 September 2015 (has links) Transactional memory (TM) has emerged as a promising synchronization abstraction for multi-core architectures. Unlike traditional lock-based approaches, TM shifts the burden of implementing threads synchronization from the programmer to an underlying framework using hardware (HTM) and/or software (STM) components. Although TM can be leveraged to implement transactional data structures (i.e., those where multiple operations are allowed to execute atomically, all-or-nothing, according to the transaction paradigm), its intensive speculation may result in significantly lower performance than the optimized concurrent data structures. This poor performance motivates the need to find other, more effective, alternatives for designing transactional data structures without losing the simple programming abstraction proposed by TM. To do so, we identified three major challenges that need to be addressed to design efficient transactional data structures. The first challenge is composability, namely allowing an atomic execution of two or more data structure operations in the same way as TM provides, but without its high overheads. The second challenge is integration, which enables the execution of data structure operations within generic transactions that may contain other memory- based operations. The last challenge is modeling, which encompasses the necessity of defining a unified formal methodology to reason about the correctness of transactional data structures. In this dissertation, we propose different approaches to address the above challenges. First, we address the composability challenge by introducing an optimistic methodology to effi- ciently convert concurrent data structures into transactional ones. Second, we address the integration challenge by injecting the semantic operations of those transactional data struc- ture into TM frameworks, and by presenting two novel STM algorithms in order to enhance the overall performance of those frameworks. Finally, we address the modeling challenge by presenting two models for concurrent and transactional data structures designs. • Our first main contribution in this dissertation is Optimistic transactional boosting (OTB), a methodology to design transactional versions of the highly concurrent optimistic (i.e., lazy) data structures. An earlier (pessimistic) boosting proposal added a layer of abstract locks on top of existing concurrent data structures. Instead, we propose an optimistic boosting methodology, which allows greater data structure-specific optimizations, easier integration with TM frameworks, and lower restrictions on the operations than the original (more pessimistic) boosting methodology. Based on the proposed OTB methodology, we implement the transactional version of two list-based data structures (i.e., set and priority queue). Then, we present TxCF-Tree, a balanced tree whose design is optimized to support transactional accesses. The core optimizations of TxCF-Tree's operations are: providing a traversal phase that does not use any lock and/or speculation and deferring the lock acquisition or physical modification to the transaction's commit phase; isolating the structural operations (such as re-balancing) in an interference-less housekeeping thread; and minimizing the interference between structural operations and the critical path of semantic operations (i.e., additions and removals on the tree). • Our second main contribution is to integrate OTB with both STM and HTM algorithms. For STM, we extend the design of both DEUCE, a Java STM framework, and RSTM, a C++ STM framework, to support the integration with OTB. Using our extension, programmers can include both OTB data structure operations and traditional memory reads/writes in the same transaction. Results show that OTB performance is closer to the optimal lazy (non-transactional) data structures than the original boosting algorithm. On the HTM side, we introduce a methodology to inject semantic operations into the well-known hybrid transactional memory algorithms (e.g., HTM-GL, HyNOrec, and NOre- cRH). In addition, we enhance the proposed semantically-enabled HTM algorithms with a lightweight adaptation mechanism that allows bypassing the HTM paths if the overhead of the semantic operations causes repeated HTM aborts. Experiments on micro- and macro- benchmarks confirm that our proposals outperform the other TM solutions in almost all the tested workloads. • Our third main contribution is to enhance the performance of TM frameworks in gen- eral by introducing two novel STM algorithms. Remote Transaction Commit (RTC) is a mechanism for executing commit phases of STM transactions in dedicated server cores. RTC shows significant improvements compared to its corresponding validation based STM algorithm (up to 4x better) as it decreases the overhead of spin locking during commit, in terms of cache misses, blocking of lock holders, and CAS operations. Remote Inval- idation (RInval) applies the same idea of RTC on invalidation based STM algorithms. Furthermore, it allows more concurrency by executing commit and invalidation routines concurrently in different servers. RInval performs up to 10x better than its corresponding invalidation based STM algorithm (InvalSTM), and up to 2x better than its corresponding validation-based algorithm (NOrec). • Our fourth and final main contribution is to provide a theoretical model for concurrent and transactional data structures. We exploit the similarities of the OTB-based data structures and provide a unified model to reason about the correctness of those designs. Specifically, we extend a recent approach that models data structures with concurrent readers and a single writer (called SWMR), and we propose two novel models that additionally allow multiple writers and transactional execution. Those models are more practical because they cover a wider set of data structures than the original SWMR model. / Ph. D. Transactional Memory STM HTM Transactional Boosting Concurrent Data Structures Optimistic Semantic Synchronization Lazy List Balanced Trees Hybrid Transactions Semantic Validation Remote Transaction Commit Modeling Linearizability

1

Page generated in 0.0731 seconds