Global ETD Search

51	Análise e estudo de desempenho e consumo de energia de memórias transacionais em software / Performance and energy consumption analysis and study on software transactional memories Garcia, Leonardo Augusto Guimarães, 1981- 23 August 2018 (has links) Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-23T23:05:30Z (GMT). No. of bitstreams: 1 Garcia_LeonardoAugustoGuimaraes_M.pdf: 3415043 bytes, checksum: 9df4293802303aa68e123c48882f387f (MD5) Previous issue date: 2013 / Resumo: A evolução das arquiteturas de computadores nos últimos anos, com a considerável introdução de processadores com vários núcleos e computadores com vários processadores, inclusive em máquinas consideradas de baixo poder de processamento, faz com que seja primordial o desenvolvimento de novos paradigmas e modelos de programação paralela que sejam fáceis de usar e depurar pela grande maioria dos programadores de sistemas. Os modelos de programação paralela, atualmente disponíveis, são baseados em primitivas de implementação cujo uso é complexo, tedioso e altamente sujeito a erros como, por exemplo, locks, semáforos, sinais, mutexes, monitores e threads. Neste cenário, as Memórias Transacionais (TM) aparecem como uma alternativa promissora que promete ser eficiente e, ao mesmo tempo, fácil de programar. Muita pesquisa foi feita nos últimos anos em relação às Memórias Transacionais em Software (STM), a maior parte delas relacionada a seu desempenho, com pouca atenção dada a outras métricas importantes, como, por exemplo, o consumo energético e como este se relaciona com o tempo de execução - energy-delay product (EDP). Esta dissertação de mestrado faz uma avaliação destas métricas em uma STM configurada com diversas políticas de gerenciamento da TM e de utilização energética, sendo algumas destas combinações inéditas. É mostrado que os resultados para desempenho e EDP nem sempre seguem a mesma tendência e, portanto, pode ser apropriado escolher diferentes políticas de gerenciamento dependendo de qual é o foco da otimização que se deseja fazer, sendo que algumas vezes a execução sequencial pode ser melhor que qualquer execução paralela. De uma forma geral, a execução com o uso de TM foi mais rápida quando comparada com a execução sequencial em dois terços dos casos analisados e teve melhor EDP em um terço das execuções. Através desta análise foi possível derivar um conjunto mínimo de políticas de gerenciamento da TM que foram capazes de entregar melhor resultado para o conjunto de benchmarks estudados, além de identificar tendências sobre o comportamento dos sistemas de TM para grupos de benchmarks quando se varia o número de núcleos executando em paralelo e o tamanho da carga de trabalho / Abstract: The recent unveilings on the computer architecture area, with the massive introduction of multi-core processors and computers with many processors in the last years, even in embedded systems, has brought to light the necessity to develop new paradigms and models for parallel programming that could be leveraged and are easy to debug by the majority of the developers. The current parallel programming models are based on primitives whose use is complex, tedious and highly error prone, such as locks, semaphores, signals, mutexes, monitors and threads. In this scenario, the Transactional Memories (TM) appear as a promising alternative which aims to be efficient and, at the same time, easy to program. Lots of research have been made on the past years on Software Transactional Memories (STM), the majority of them interested on the performance of such systems, with little attention given to other metrics such as energy and the relationship between energy and performance, known as the energy-delay product (EDP). This work evaluates these metrics in an STM configured with a number of TM and energy management policies, some of them new. It is shown that performance and EDP do not always follow the same trend, and, because of that, it might be appropriate to choose different management policies depending on the optimization target. It is also important to never forget about the sequential execution, as it can be more advantageous than any parallel execution in some scenarios. Overall, the execution with TM has a better performance when compared to the sequential execution in two thirds of the analysed situations, and a better EDP in one third of the scenarios. Through the analysis made in this work, it was possible to derive a minimal set of TM management policies that were able to deliver the best results with the benchmarks analysed. It was also possible to identify behavioural trends on the TM systems in certain sets of benchmarks when changing the number of cores in the execution and the workload size / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Memória transacional Software - Desempenho Energia - Consumo Transactional memory Software - Performance Energy consumption Software - Speedup
52	Software Transactional Memory Techniques : Principles, Design, and Implementation Trade-offs Nasir, Muhammad January 2009 (has links) The advent of multicore processors has put the performance of traditional parallel programming techniques in question. The traditional lock-based parallel programming techniques are error prone and suffer from various problems such as deadlocks, live-locks, priority inversion etc. In the last one and half decade, a considerable amount of the research has been carried out to achieve the synchronization among the parallel applications without using locking. One of most promising technique which has come out as a result of this research work is Transactional Memory (TM). Transactional Memory system commits the data in atomic code sequences called the transaction. Research has shown that Transactional Memory has the potential to out perform traditional locking mechanisms. In order to understand the design and implementation trade-offs of different implementations of the Software Transactional Memory, a comprehensive comparative study is required. Although some comparative studies have been carried out in the past, they were very focused in their scope and covered only few STM implementations. In this master thesis, a qualitative literature survey is conducted and the state of the art in Software Transactional Memory is presented, covering prominent approaches to date while discussing their design and implementation trade offs. / Cell. Number: 0046762600489 Address: c/o Muhammad Naveed Ahmed, G-Infartsvagen 3B, Appartment No. 683, 371 41 Karlkrona, Sweden Multiprocessor Concurrency Synchronization Transactional Memory Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
53	Performance Tradeoffs in Software Transactional Memory Abbas, Gulfam, Asif, Naveed January 2010 (has links) Transactional memory (TM), a new programming paradigm, is one of the latest approaches to write programs for next generation multicore and multiprocessor systems. TM is an alternative to lock-based programming. It is a promising solution to a hefty and mounting problem that programmers are facing in developing programs for Chip Multi-Processor (CMP) architectures by simplifying synchronization to shared data structures in a way that is scalable and compos-able. Software Transactional Memory (STM) a full software approach of TM systems can be defined as non-blocking synchronization mechanism where sequential objects are automatically converted into concurrent objects. In this thesis, we present performance comparison of four different STM implementations – RSTM of V. J. Marathe, et al., TL2 of D. Dice, et al., TinySTM of P. Felber, et al. and SwissTM of A. Dragojevic, et al. It helps us in deep understanding of potential tradeoffs involved. It further helps us in assessing, what are the design choices and configuration parameters that may provide better ways to build better and efficient STMs. In particular, suitability of an STM is analyzed against another STM. A literature study is carried out to sort out STM implementations for experimentation. An experiment is performed to measure performance tradeoffs between these STM implementations. The empirical evaluations done as part of this thesis conclude that SwissTM has significantly higher throughput than state-of-the-art STM implementations, namely RSTM, TL2, and TinySTM, as it outperforms consistently well while measuring execution time and aborts per commit parameters on STAMP benchmarks. The results taken in transaction retry rate measurements show that the performance of TL2 is better than RSTM, TinySTM and SwissTM. Multiprocessor Concurrent Programming Synchronization Software Transactional Memory Performance Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
54	Software lock elision for x86 machine code Roy, Amitabha January 2011 (has links) More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a 'niche' programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMPlibraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it. 005.3
55	Finite Element Computations on Multicore and Graphics Processors Ljungkvist, Karl January 2017 (has links) In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computations using the finite element method. One of the main difficulties in finite-element computations is how to perform the assembly of the system matrix efficiently in parallel, due to its complicated memory access pattern. The challenge lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype multicore processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data. Secondly, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. In addition to removing the need to store the system matrix, matrix-free methods are attractive due to their low memory footprint and therefore better match the architecture of modern processors where memory bandwidth is scarce and compute power is abundant. Motivated by this, we consider matrix-free implementations of high-order finite-element methods for execution on graphics processors, which have seen a revolutionary increase in usage for numerical computations during recent years due to their more efficient architecture. In the implementation, we exploit sum-factorization techniques for efficient evaluation of matrix-vector products, mesh coloring and atomic updates for concurrent updates, and a geometric multigrid algorithm for efficient preconditioning of iterative solvers. Our performance studies show that on the GPU, a matrix-free approach is the method of choice for elements of order two and higher, yielding both a significantly faster execution, and allowing for solution of considerably larger problems. Compared to corresponding CPU implementations executed on comparable multicore processors, the GPU implementation is about twice as fast, suggesting that graphics processors are about twice as power efficient as multicores for computations of this kind. Finite Element Methods GPU Matrix-Free Multigrid Transactional Memory Computer Science Datavetenskap (datalogi) Computational Mathematics Beräkningsmatematik
56	Code profiling and optimization in transactional memory systems / Profiling e otimização de código em sistemas de memória transacional Cordeiro, Silvio Ricardo January 2014 (has links) Memória Transacional tem se demonstrado um paradigma promissor na implementação de aplicações concorrentes sob memória compartilhada que busquem evitar um modelo de sincronização baseado em locks. Em vez de sujeitar a execução a um acesso exclusivo com base no valor de um lock que é compartilhado por threads concorrentes, uma aplicação sob Memória Transacional tenta executar seções críticas de modo otimista, desfazendo as modificações no caso de um conflito de acesso à memória. Entretanto, apesar de a abordagem baseada em locks ter adquirido um número significativo de ferramentas automatizadas para a depuração, profiling e otimização automatizados (por ser uma das técnicas de sincronização mais antigas e mais bem pesquisadas), o campo da Memória Transacional ainda é comparativamente recente, e programadores frequentemente precisam adaptar manualmente suas aplicações transacionais ao encontrar problemas de eficiência. Este trabalho propõe um sistema no qual o profiling de código em uma implementação de Memória Transacional simulada é utilizado para caracterizar uma aplicação transacional, formando a base para uma parametrização automatizada do respectivo sistema especulativo para uma execução eficiente do código em questão. Também é proposta uma abordagem de escalonamento de threads guiado por profiling em uma implementação de Memória Transacional baseada em software, usando dados coletados pelo profiler para prever a probabilidade de conflitos e determinar que thread escalonar com base nesta previsão. São apresentados os resultados de experimentos sob ambas as abordagens. / Transactional Memory has shown itself to be a promising paradigm for the implementation of shared-memory concurrent applications that eschew a lock-based model of data synchronization. Rather than conditioning exclusive access on the value of a lock that is shared across concurrent threads, Transactional Memory attempts to execute critical sections optimistically, rolling back the modifications in the event of a data access conflict. However, while the lock-based approach has acquired a significant body of debugging, profiling and automated optimization tools (as one of the oldest and most researched synchronization techniques), the field of Transactional Memory is still comparably recent, and programmers are usually tasked with an unguided manual tuning of their transactional applications when facing efficiency problems. We propose a system in which code profiling in a simulated hardware implementation of Transactional Memory is used to characterize a transactional application, which forms the basis for the automated tuning of the underlying speculative system for the efficient execution of that particular application. We also propose a profile-guided approach to the scheduling of threads in a software-based implementation of Transactional Memory, using collected data to predict the likelihood of conflicts and determine what thread to schedule based on this prediction. We present the results achieved under both designs. Processamento paralelo Processamento : Alto desempenho Transactional memory Profiling Scheduling Shared memory Parallel programming High-performance computing
57	Avaliação de desempenho do sistema de memória transacional de Clojure como biblioteca de sincronização na linguagem Java / Performance evaluation of Clojure transactional memory system as a synchronization library in Java language Pablo César Calcina Ccori 14 June 2011 (has links) Neste trabalho apresenta-se uma avaliação do desempenho da implementação de memória transacional da linguagem Clojure, utilizada como biblioteca de sincronização para uso em conjunto com outras aplicações dentro da máquina virtual de Java. É implementada uma camada de interface entre as estruturas de dados de Clojure e o benchmark STMBench7 e são discutidos alguns aspectos que geram sobrecarga no desempenho. / In this work a performance evaluation of Clojure transactional memory implementation is presented, using it as a synchronization library to work together with other applications on Java virtual machine. It is implemented an interface layer between Clojure data structures and STMBench7 benchmark, and issues about overhead in performance are discussed. clojure memória transactional em software stm clojure software transactional memory stm
58	Design of a Distributed Transactional Memory for Many-core systems Trigonakis, Vasileios January 2011 (has links) The emergence of Multi/Many-core systems signified an increasing need for parallel programming. Transactional Memory (TM) is a promising programming paradigm for creating concurrent applications. At current date, the design of Distributed TM (DTM) tailored for non coherent Manycore architectures is largely unexplored. This thesis addresses this topic by analysing, designing, and implementing a DTM system suitable for low latency message passing platforms. The resulting system, named SC-TM, the Single-Chip Cloud TM, is a fully decentralized and scalable DTM, implemented on Intel’s SCC processor; a 48-core ’concept vehicle’ created by Intel Labs as a platform for Many-core software research. SC-TM is one of the first fully decentralized DTMs that guarantees starvation-freedom and the first to use an actual pluggable Contention Manager (CM) to ensure liveness. Finally, this thesis introduces three completely decentralized CMs; Offset-Greedy, a decentralized version of Greedy, Wholly, which relies on the number of completed transactions, and FairCM, that makes use off the effective transactional time. The evaluation showed the latter outperformed the three. Engineering and Technology Teknik och teknologier
59	Experiments with Hardware-based Transactional Memory in Parallel Simulation Hay, Joshua A. 13 October 2014 (has links) No description available. Computer Engineering transactional memory TSX parallel simulation parallel discrete event simulation PDES lock contention
60	Efficient Runtime Support for Reliable and Scalable Parallelism Zhang, Minjia January 2016 (has links) No description available. Computer Engineering

Search results