Global ETD Search

61	Collaborative Scheduling and Synchronization of Distributable Real-Time Threads Fahmy, Sherif Fadel 17 June 2010 (has links) In this dissertation, we consider the problem of scheduling and synchronization of distributable real-time threads --- Real-Time CORBA's first-class abstraction for programming real-time, multi-node sequential behaviors. Distributable real-time threads can be scheduled, broadly, using two paradigms: node independent scheduling, in which nodes independently construct thread schedules, based on node-level decomposition of distributable thread (or DT) scheduling parameters, and collaborative scheduling, in which nodes collaborate to construct system-wide thread schedules, which may or may not involve scheduling parameter decomposition. While significant literature exists on node independent scheduling, little is known about collaborative scheduling and its concomitant tradeoffs. We design three collaborative scheduling algorithms, called ACUA, QBUA, and DQBUA. ACUA uses theory of consensus and QBUA uses theory of quorums for distributable thread schedule construction. DQBUA extends QBUA with lock-based, local and distributed concurrency control. The algorithms consider a model where distributable threads arrive arbitrarily, have time/utility function time constraints, access resources in an arbitrary way (e.g., arbitrary lock acquire/release order, arbitrary nestings), and are subject to arbitrary node crash failures and message losses. We analytically establish several properties of the algorithms including probabilistic end-to-end termination time satisfactions, timeliness optimality during underloads, bounded exception handling time, and correctness of the algorithms in partially synchronous systems. We implement distributable real-time threads in the Linux kernel as a first-class programming and scheduling abstraction. The resulting kernel, called ChronOS, provides application interfaces for creating and manipulating distributable threads, as well as kernel interfaces and mechanisms for scheduling them (using both independent and collaborative approaches). ChronOS also has failure detector mechanisms for detecting and recovering from distributable thread failures. We implement the proposed scheduling algorithms and their competitors in ChronOS and compare their behavior. Our studies reveal that the collaborative scheduling algorithms are superior to independent scheduling algorithms for certain thread sets, in particular, when thread sections have significantly varying execution time. This variability, especially if the variability is not consistent among the threads, may cause each node to make conflicting decisions in the absence of global information. We observe that collaborative schedulers outperform independent schedulers (e.g., EDF augmented with PIP) in terms of accrued utility by as much as 75%. We identify distributed dependencies as one of the major sources of overhead in collaborative scheduling. In particular, the cost of distributed lock-based concurrency control (e.g., lock management, distributed deadlock detection/resolution) can significantly reduce the problem space for which collaborative scheduling is beneficial. To mitigate this, we consider the use of software transactional memory (or STM), an optimistic, non-blocking synchronization alternative to lock-based concurrency control which has been extensively studied in non real-time contexts. We consider distributable real-time threads with STM concurrency control, and develop techniques for analyzing and bounding their end-to-end response times on distributed single-processor and distributed multiprocessor systems. We also develop contention management techniques, a key component of STM, which are driven by threads' real-time scheduling parameters, and establish their tradeoffs against non-real-time contention managers. / Ph. D. Time/Utility Functions Utility Accrual Scheduling Distributed Real-Time Scheduling Software Transactional Memory
62	Scheduling Memory Transactions in Distributed Systems Kim, Junwhan 15 October 2013 (has links) Distributed transactional memory (DTM) is an emerging, alternative concurrency control model that promises to alleviate the difficulties of lock-based distributed synchronization. In DTM, transactional conflicts are traditionally resolved by a contention manager. A complementary approach for handling conflicts is through a transactional scheduler, which orders transactional requests to avoid or minimize conflicts. We present a suite of transactional schedulers: Bi-interval, Commutative Requests First (CRF), Reactive Transactional Scheduler (RTS), Dependency-Aware Transactional Scheduler} (DATS), Scheduling-based Parallel Nesting} (SPN), Cluster-based Transactional Scheduler} (CTS), and Locality-aware Transactional Scheduler} (LTS). The schedulers consider Herlihy and Sun's dataflow execution model, where transactions are immobile and objects are migrated to invoking transactions, relying on directory-based cache-coherence protocols to locate and move objects. Within this execution model, the proposed schedulers target different DTM models. Bi-interval considers the single object copy DTM model, and categorizes concurrent requests into read and write intervals to maximize the concurrency of read transactions. This allows an object to be simultaneously sent to read transactions, improving transactional makespan. We show that Bi-interval improves the makespan competitive ratio of DTM without such a scheduler to O(log(N)) for the worst-case and (log(N - k) for the average-case, for N nodes and k read transactions. Our implementation reveals that Bi-interval enhances transactional throughput over the no-scheduler case by as much as 1.71x, on average. CRF considers multi-versioned DTM. Traditional multi-versioned TM models use multiple object versions to guarantee commits of read transactions, but limit concurrency of write transactions. CRF relies on the notion of commutative transactions, i.e., those that ensure consistency of the shared data-set even when they are validated and committed concurrently. CRF detects conflicts between commutative and non-commutative write transactions and then schedules them according to the execution state, enhancing the concurrency of write transactions. Our implementation shows that transactional throughput is improved by up to 5x over a state-of-the-art competitor (DecentSTM). RTS and DATS consider transactional nesting in DTM, and focus on the closed and open nesting models, respectively. RTS determines whether a conflicting outer transaction must be aborted or enqueued according to the level of contention. If a transaction is enqueued, its closed-nested transactions do not have to retrieve objects again, resulting in reduced communication delays. DATS's goal is to boost the throughput of open-nested transactions by reducing the overhead of running expensive compensating actions and acquiring/releasing abstract locks when the outer transaction aborts. The contribution of DATS is twofold. First, it allows commutable outer transactions to be validated concurrently and allows non-commutable outer transactions -- depending on their inner transactions -- to be committed before others without dependencies. Implementations reveal effectiveness: RTS and DATS improve throughput (over the no-scheduler case), by as much as 1.88x and 2.2x, respectively. SPN considers parallel nested transactions in DTM. The idea of parallel nesting is to execute the inner transactions that access different objects concurrently, and execute the inner transactions that access the same objects serially, increasing performance. However, the parallel nesting model may be ineffective if all inner transactions access the same object due to the additional overheads needed to identify both types of inner transactions. SPN avoids this overhead and allows inner transactions to request objects and to execute them in parallel. Implementations reveal that SPN outperforms non-parallel nesting (i.e., closed nesting) by up to 3.5x and 4.5x on a micro-benchmark (bank) and the TPC-C transactional benchmark, respectively. CTS considers the replicated DTM model: object replicas are distributed across clusters of nodes, where clusters are determined based on inter-node distance, to maximize locality and fault-tolerance, and to minimize memory usage and communication overhead. CTS enqueues transactions that are aborted due to early validation over clusters and assigns their backoff times, reducing communication overhead. Implementation reveals that CTS improves throughput over competitor replicated DTM solutions including GenRSTM and DecentSTM by as much as 1.64x, on average. LTS considers the genuine partial replicated DTM model. In this model, LTS exploits locality by: 1) employing a transaction scheduler, which enables/disables object ownership changes depending on workload fluctuations, and 2) splitting hot-spot objects into multiple replicas for reducing contention. Our implementation reveals that LTS outperforms state-of-the-art competitors (Score and CTS) by up to 2.6x on micro-benchmarks (Linked List and Skip List) and by up to 2.2x on TPC-C. / Ph. D. Software Transactional Memory Distributed Systems Transactional Scheduling Partial Replication (Parallel) Nested Transactions
63	Supporting Software Transactional Memory in Distributed Systems: Protocols for Cache-Coherence, Conflict Resolution and Replication Zhang, Bo 05 December 2011 (has links) Lock-based synchronization on multiprocessors is inherently non-scalable, non-composable, and error-prone. These problems are exacerbated in distributed systems due to an additional layer of complexity: multinode concurrency. Transactional memory (TM) is an emerging, alternative synchronization abstraction that promises to alleviate these difficulties. With the TM model, code that accesses shared memory objects are organized as transactions, which speculatively execute, while logging changes. If transactional conflicts are detected, one of the conflicting transaction is aborted and re-executed, while the other is allowed to commit, yielding the illusion of atomicity. TM for multiprocessors has been proposed in software (STM), in hardware (HTM), and in a combination (HyTM). This dissertation focuses on supporting the TM abstraction in distributed systems, i.e., distributed STM (or D-STM). We focus on three problem spaces: cache-coherence (CC), conflict resolution, and replication. We evaluate the performance of D-STM by measuring the competitive ratio of its makespan --- i.e., the ratio of its makespan (the last completion time for a given set of transactions) to the makespan of an optimal off-line clairvoyant scheduler. We show that the performance of D-STM for metric-space networks is O(N^2) for N transactions requesting an object under the Greedy contention manager and an arbitrary CC protocol. To improve the performance, we propose a class of location-aware CC protocols, called LAC protocols. We show that the combination of the Greedy manager and a LAC protocol yields an O(NlogN s) competitive ratio for s shared objects. We then formalize two classes of CC protocols: distributed queuing cache-coherence (DQCC) protocols and distributed priority queuing cache-coherence (DPQCC) protocols, both of which can be implemented using distributed queuing protocols. We show that a DQCC protocol is O(NlogD)-competitive and a DPQCC protocol is O(log D_delta)-competitive for N dynamically generated transactions requesting an object, where D_delta is the normalized diameter of the underlying distributed queuing protocol. Additionally, we propose a novel CC protocol, called Relay, which reduces the total number of aborts to O(N) for N conflicting transactions requesting an object, yielding a significantly improvement over past CC protocols which has O(N^2) total number of aborts. We also analyze Relay's dynamic competitive ratio in terms of the communication cost (for dynamically generated transactions), and show that Relay's dynamic competitive ratio is O(log D_0), where D_0 is the normalized diameter of the underlying network spanning tree. To reduce unnecessary aborts and increase concurrency for D-STM based on globally-consistent contention management policies, we propose the distributed dependency-aware (DDA) conflict resolution model, which adopts different conflict resolution strategies based on transaction types. In the DDA model, read-only transactions never abort by keeping a set of versions for each object. Each transaction only keeps precedence relations based on its local knowledge of precedence relations. We show that the DDA model ensures that 1) read-only transactions never abort, 2) every transaction eventually commits, 3) supports invisible reads, and 4) efficiently garbage collects useless object versions. To establish competitive ratio bounds for contention managers in D-STM, we model the distributed transactional contention management problem as the traveling salesman problem (TSP). We prove that for D-STM, any online, work conserving, deterministic contention manager provides an Omega(max[s,s^2/D]) competitive ratio in a network with normalized diameter D and s shared objects. Compared with the Omega(s) competitive ratio for multiprocessor STM, the performance guarantee for D-STM degrades by a factor proportional to s/D. We present a randomized algorithm, called Randomized, with a competitive ratio O(sClog n log ^{2} n) for s objects shared by n transactions, with a maximum conflicting degree C. To break this lower bound, we present a randomized algorithm Cutting, which needs partial information of transactions and an approximate TSP algorithm A with approximation ratio phi_A. We show that the average case competitive ratio of Cutting is O(s phi_A log^{2}m log^{2}n), which is close to O(s). Single copy (SC) D-STM keeps only one writable copy of each object, and thus cannot tolerate node failures. We propose a quorum-based replication (QR) D-STM model, which provides provable fault-tolerance without incurring high communication overhead, when compared with the SC model. The QR model stores object replicas in a tree quorum system, where two quorums intersect if one of them is a write quorum, and ensures the consistency among replicas at commit-time. The communication cost of an operation in the QR model is proportional to the communication cost from the requesting node to its closest read or write quorum. In the presence of node failures, the QR model exhibits high availability and degrades gracefully when the number of failed nodes increases, with reasonable higher communication cost. We develop a protoytpe implementation of the dissertation's proposed solutions, including DQCC and DPQCC protocols, Relay protocol, and the DDA model, in the HyFlow Java D-STM framework. We experimentally evaluated these solutions with respective competitor solutions on a set of microbenchmarks (e.g., data structures including distributed linked list, binary search tree and red-black tree) and macrobenchmarks (e.g., distributed versions of the applications in the STAMP STM benchmark suite for multiprocessors). Our experimental studies revealed that: 1) based on the same distributed queuing protocol (i.e., Ballistic CC protocol), DPQCC yields better transactional throughput than DQCC, by a factor of 50% - 100%, on a range of transactional workloads; 2) Relay outperforms competitor protocols (including Arrow, Ballistic and Home) by more than 200% when the network size and contention increase, as it efficiently reduces the average aborts per transaction (less than 0.5); 3) the DDA model outperforms existing contention management policies (including Greedy, Karma and Kindergarten managers) by upto 30%-40% in high contention environments; For read/write-balanced workloads, the DDA model outperforms these contention management policies by 30%-60% on average; for read-dominated workloads, the model outperforms by over 200%. / Ph. D. Cache-Coherence Distributed Queuing Contention Management Software Transactional Memory Replication Quorum System
64	Improving Performance of Highly-Programmable Concurrent Applications by Leveraging Parallel Nesting and Weaker Isolation Levels Niles, Duane Francis Jr. 15 July 2015 (has links) The recent development of multi-core computer architectures has largely affected the creation of everyday applications, requiring the adoption of concurrent programming to significantly utilize the divided processing power of computers. Applications must be split into sections able to execute in parallel, without any of these sections conflicting with one another, thereby necessitating some form of synchronization to be declared. The most commonly used methodology is lock-based synchronization; although, to improve performance the most, developers must typically form complex, low-level implementations for large applications, which can easily create potential errors or hindrances. An abstraction from database systems, known as transactions, is a rising concurrency control design aimed to circumvent the challenges with programmability, composability, and scalability in lock-based synchronization. Transactions execute their operations speculatively and are capable of being restarted (or rolled back) when there exist conflicts between concurrent actions. As such issues can occur later in the lifespans of transactions, entire rollbacks are not that effective for performance. One particular method, known as nesting, was created to counter that drawback. Nesting is the act of enclosing transactions within other transactions, essentially dividing the work into pieces called sub-transactions. These sub-transactions can roll back without affecting the entire main transaction, although general nesting models only allow one sub-transaction to perform work at a time. The first main contribution in this thesis is SPCN, an algorithm that parallelizes nested transactions while automatically processing any potential conflicts that may arise, eliminating the burden of additional processing from the application developers. Two versions of SPCN exist: Strict, which enforces the sub-transactions' work to be made visible in a serialized order; and Relaxed, which allows sub-transactions to distribute their information immediately as they finish (therefore invalidation may occur after-the-fact and must be handled). Despite the additional logic required by SPCN, it outperforms traditional closed nesting by 1.78x at the lowest and 3.78x at the highest in the experiments run. Another method to alter transactional execution and boost performance is to relax the rules of visibility for parallel operations (known as their isolation). Depending on the application, correctness is not broken even if some transactions see external work that may later be undone due to a rollback, or if an object is written while another transaction is using an older instance of its data. With lock-based synchronization, developers would have to explicitly design their application with varying amounts of locks, and different lock organizations or hierarchies, to change the strictness of the execution. With transactional systems, the processing performed by the system itself can be set to utilize different rulings, which can change the performance of an application without requiring it to be largely redesigned. This notion leads to the second contribution in this thesis: AsR, or As-Serializable transactions. Serializability is the general form of isolation or strictness for transactions in many applications. In terms of execution, its definition is equivalent to only one transaction running at a time in a given system. Many transactional systems use their own internal form of locking to create Serializable executions, but it is typically too strict for many applications. AsR transactions allow the internal processing to be relaxed while additional meta-data is used external to the system, without requiring any interaction from the developer or any changes to the given application. AsR transactions offer multiple orders of magnitude more in throughput in highly-contentious scenarios, due to their capability to outlast traditional levels of isolation. / Master of Science Concurrency Transactional Memory Parallel Nesting Distributed Systems Databases Transactions Serializability Weaker Isolation Levels
65	HyFlow: A High Performance Distributed Software Transactional Memory Framework Saad Ibrahim, Mohamed Mohamed 14 June 2011 (has links) We present HyFlow - a distributed software transactional memory (D-STM) framework for distributed concurrency control. Lock-based concurrency control suffers from drawbacks including deadlocks, livelocks, and scalability and composability challenges. These problems are exacerbated in distributed systems due to their distributed versions which are more complex to cope with (e.g., distributed deadlocks). STM and D-STM are promising alternatives to lock-based and distributed lock-based concurrency control for centralized and distributed systems, respectively, that overcome these difficulties. HyFlow is a Java framework for DSTM, with pluggable support for directory lookup protocols, transactional synchronization and recovery mechanisms, contention management policies, cache coherence protocols, and network communication protocols. HyFlow exports a simple distributed programming model that excludes locks: using (Java 5) annotations, atomic sections are defiend as transactions, in which reads and writes to shared, local and remote objects appear to take effect instantaneously. No changes are needed to the underlying virtual machine or compiler. We describe HyFlow's architecture and implementation, and report on experimental studies comparing HyFlow against competing models including Java remote method invocation (RMI) with mutual exclusion and read/write locks, distributed shared memory (DSM), and directory-based D-STM. / Master of Science Directory Protocols Cache Coherence Contention Management Control-Flow Dataflow Software Transactional Memory Distributed Systems
66	Hardware Techniques for High-Performance Transactional Memory in Many-Core Chip Multiprocessors / Técnicas Hardware para Sistemas de Memoria Transaccional de Alto Rendimiento en Procesadores Multinúcleo Titos Gil, José Rubén 08 November 2011 (has links) Esta tesis investiga la implementación hardware eficiente de los sistemas de memoria transaccional (HTM) en un chip multiprocesador escalable (CMP), identificando aspectos que limitan el rendimiento y proponiendo técnicas que solventan dichas patologías. Las contribuciones de la tesis son varios diseños HTM complementarios que alcanzan un rendimiento robusto y evitan comportamientos patológicos, mediante la introducción de flexibilidad y adaptabilidad, sin que dichas técnicas apenas supongan un incremento en la complejidad del sistema global. Esta disertación considera tanto sistemas HTM de política ansiosa como aquellos diseñados bajo el enfoque perezoso, y afrontamos las sobrecargas en el rendimiento que son inherentes a cada política. Quizá la contribución más relevante de esta tesis es ZEBRA, un sistema HTM de política híbrida que adapta su comportamiento en función de las características dinámicas de la carga de trabajo. / This thesis focuses on the hardware mechanisms that provide optimistic concurrency control with guarantees of atomicity and isolation, with the intent of achieving high-performance across a variety of workloads, at a reasonable cost in terms of design complexity. This thesis identifies key inefficiencies that impact the performance of several hardware implementations of TM, and proposes mechanisms to overcome such limitations. In this dissertation we consider both eager and lazy approaches to HTM system design, and address important sources of overhead that are inherent to each policy. This thesis presents a hybrid-policy, adaptable HTM system that combines the advantages of both eager and lazy approaches in a low complexity design. Furthermore, this thesis investigates the overheads of the simpler, fixed-policy HTM designs that leverage a distributed directory-based coherence protocol to detect data races over a scalable interconnect, and develops solutions that address some performance degrading factors. memoria transaccional transactional memory programación paralela parallel programming multiprocesadores chip multiprocessors parallel architectures arquitecturas paralelas Arquitectura de Computadores 62
67	Optimistic semantic synchronization Sreeram, Jaswanth 06 October 2011 (has links) Within the last decade multi-core processors have become increasingly commonplace with the power and performance demands of modern real-world programs acting to accelerate this trend. The rapid advancements in designing and adoption of such architectures mean that there is a serious need for programming models that allow the development of correct parallel programs that execute efficiently on these processors. A principle problem in this regard is that of efficiently synchronizing concurrent accesses to shared memory. Traditional solutions to this problem are either inefficient but provide programmability (coarse-grained locks) or are efficient but are not composable and very hard to program and verify (fine-grained locks). Optimistic Transactional Memory systems provide many of the composability and programmabillity advantages of coarse-grained locks and good theoretical scaling but several studies have found that their performance in practice for many programs remains quite poor primarily because of the high overheads of providing safe optimism. Moreover current transactional memory models remain rigid - they are not suited for expressing some of the complex thread interactions that are prevalent in modern parallel programs. Moreover, the synchronization achieved by these transactional memory systems is at the physical or memory level. This thesis advocates a position that memory synchronization problem for threads should be modeled and solved in terms of synchronization of underlying program values which have semantics associated with them. It presents optimistic synchronization techniques that address the semantic synchronization requirements of a parallel program instead. These techniques include methods to 1) enable optimistic transactions to recover from expensive sharing conflicts without discarding all the work made possible by the optimism 2) enable a hybrid pessimistic-optimistic form of concurrency control that lowers overheads 3) make synchronization value-aware and semantics-aware 4) enable finer grained consistency rules (than allowed by traditional optimistic TM models) therefore avoiding conflicts that do not enforce any semantic property required by the program. In addition to improving the expressibility of specific synchronization idioms all these techniques are also effective in improving parallel performance. This thesis formulates these techniques in terms of their purpose, the extensions to the language, the compiler as well as to the concurrency control runtime necessary to implement them. It also briefly presents an experimental evaluation of each of them on a variety of modern parallel workloads. These experiments show that these techniques significantly improve parallel performance and scalability over programs using state-of-the-art optimistic synchronization methods. Synchronization Transactional memory Parallel computing Computer architecture Multiprocessors Parallel programming (Computer science) Parallel programs (Computer programs)
68	Explorando memoria transacional em software nos contextos de arquiteturas assimetricas, jogos computacionais e consumo de energia / Exploiting software transactional memory in the context of asymmetric architectures Baldassin, Alexandro José 15 August 2018 (has links) Orientador: Paulo Cesar Centoducatte / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-15T20:59:38Z (GMT). No. of bitstreams: 1 Baldassin_AlexandroJose_D.pdf: 1596141 bytes, checksum: 265aa763c420b69f70d59ff687bd8ad9 (MD5) Previous issue date: 2009 / Resumo: A adoção dos microprocessadores com múltiplos núcleos de execução pela indústria semicondutora tem criado uma crescente necessidade por novas linguagens, metodologias e ferramentas que tornem o desenvolvimento de sistemas concorrentes mais rápido, eficiente e acessível aos programadores de todos os níveis. Uma das principais dificuldades em programação concorrente com memória compartilhada é garantir a correta sincronização do código, evitando assim condições de corrida que podem levar o sistema a um estado inconsistente. A sincronização tem sido tradicionalmente realizada através de métodos baseados em travas, reconhecidos amplamente por serem de difícil uso e pelas anomalias causadas. Um novo mecanismo, conhecido como memória transacional (TM), tem sido alvo de muita pesquisa recentemente e promete simplificar o processo de sincronização, além de possibilitar maior oportunidade para extração de paralelismo e consequente desempenho. O cerne desta tese é formado por três trabalhos desenvolvidos no contexto dos sistemas de memória transacional em software (STM). Primeiramente, apresentamos uma implementação de STM para processadores assimétricos, usando a arquitetura Cell/B.E. como foco. Como principal resultado, constatamos que o uso de sistemas transacionais em arquiteturas assimétricas também é promissor, principalmente pelo fator escalabilidade. No segundo trabalho, adotamos uma abordagem diferente e sugerimos um sistema de STM especialmente voltado para o domínio de jogos computacionais. O principal motivo que nos levou nesta direção é o baixo desempenho das implementações atuais de STM. Um estudo de caso conduzido a partir de um jogo complexo mostra a eficácia do sistema proposto. Finalmente, apresentamos pela primeira vez uma caracterização do consumo de energia de um sistema de STM considerado estado da arte. Além da caracterização, também propomos uma técnica para redução do consumo em casos de alta contenção. Resultados obtidos a partir dessa técnica revelam ganhos de até 87% no consumo de energia / Abstract: The shift towards multicore processors taken by the semiconductor industry has initiated an era in which new languages, methodologies and tools are of paramount importance to the development of efficient concurrent systems that can be built in a timely way by all kinds of programmers. One of the main obstacles faced by programmers when dealing with shared memory programming concerns the use of synchronization mechanisms so as to avoid race conditions that could possibly lead the system to an inconsistent state. Synchronization has been traditionally achieved by means of locks (or variations thereof), widely known by their anomalies and hard-to-get-it-right facets. A new mechanism, known as transactional memory (TM), has recently been the focus of a lot of research and shows potential to simplify code synchronization as well as delivering more parallelism and, therefore, better performance. This thesis presents three works focused on different aspects of software transactional memory (STM) systems. Firstly, we show an STM implementation for asymmetric processors, focusing on the architecture of Cell/B.E. As an important result, we find out that memory transactions are indeed promising for asymmetric architectures, specially due to their scalability. Secondly, we take a different approach to STM implementation by devising a system specially targeted at computer games. The decision was guided by poor performance figures usually seen on current STM implementations. We also conduct a case study using a complex game that effectively shows the system's efficiency. Finally, we present the energy consumption characterization of a state-of-the-art STM for the first time. Based on the observed characterization, we also propose a technique aimed at reducing energy consumption in highly contended scenarios. Our results show that the technique is indeed effective in such cases, improving the energy consumption by up to 87% / Doutorado / Sistemas de Computação / Doutor em Ciência da Computação Memória transacional Programação paralela (Computação) Arquitetura de computador Estimativa de potência Transactional memory Parallel programming (Computer science) Computer architecture Power estimation
69	Exploiting Speculative and Asymmetric Execution on Multicore Architectures Wamhoff, Jons-Tobias 27 March 2015 (has links) (PDF) The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software. In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution. The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources. Mehrkernprozessoren Multithreading Synchronisierung Transaktionaler Speicher Frequenzskalierung multicore multithreading locks speculation transactional memory dynamic voltage and frequency scaling ddc:004 rvk:ST 170
70	Praktické metody automatizované verifikace paralelních programů / Practical Methods of Automated Verification of Concurrent Programs Fiedor, Jan January 2017 (has links) V dnešní době jsou vícevláknové programy běžné a s nimi i chyby v souběžnosti. Během posledních let bylo vytvořeno mnoho technik pro detekci takovýchto chyb, a i přesto mají vývojáři softwaru problém nalézt správné nástroje pro analýzu svých programů. Důvod je jednoduchý, fungující neznamená vždy praktický. Hodně nástrojů implementujících detekční techniky je obtížně použitelných, přizpůsobených pro konkrétní typy programů nebo synchronizace, nebo špatně škálují, aby zvládly analyzovat rozsáhlý software. Pro některé typy chyb v souběžnosti dokonce ani neexistují nástroje pro jejich detekci, i přesto že vývojáři softwaru na tyto chyby často narážejí ve svých programech. Hlavním cílem této práce je navrhnout nové techniky pro detekci chyb ve vícevláknových programech. Tyto techniky by měly být schopny analyzovat rozsáhlé programy, umožnit detekci méně studovaných typů chyb v souběžnosti, a podporovat širokou škálu programů s ohledem na to, jaké programové konstrukce používají.

Search results