Global ETD Search

1	Data centric and adaptive source changing transactional memory with exit functionality Herath, Herath Mudiyanselage Isuru Prasenajith January 2012 (has links) Multi-core computing is becoming ubiquitous due to the scaling limitations of single-core computing. It is inevitable that parallel programming will become the mainstream for such processors. In this paradigm shift, the concept of abstraction should not be compromised. A programming model serves as an abstraction of how programs are executed. Transactional Memory (TM) is a technique proposed to maintain lock free synchronization. Due to the simplicity of the abstraction provided by it, TM can also be used as a way of distributing parallel work, maintaining coherence and consistency. Motivated by this, at a higher level, the thesis makes three contributions and all are centred around Hardware Transactional Memory (HTM).As the first contribution, a transaction-only architecture is coupled with a ``data centric" approach, to address the scalability issues of the former whilst maintaining its simplicity. This is achieved by grouping together memory locations having similar access patterns and maintaining coherence and consistency according to the group each memory location belongs to. As the second contribution a novel technique is proposed to reduce the number of false transaction aborts which occur in a signature based HTM. The idea is to adaptively switch between cache lines and signatures to detect conflicts. That is, when a transaction fits in the L1 cache, cache line information is used to detect conflicts and signatures are used otherwise. As the third contribution, the thesis makes a case for having an exit functionality in an HTM. The objective of the proposed functionality, TM_EXIT, is to terminate a transaction without restarting or committing. 004
2	Efficient, scalable, and fair read-modify-writes Rajaram, Bharghava January 2015 (has links) Read-Modify-Write (RMW) operations, or atomics, have widespread application in (a) synchronization, where they are used as building blocks of various synchronization constructs like locks, barriers, and lock-free data structures (b) supervised memory systems, where every memory operation is effectively an RMW that reads and modifies metadata associated with memory addresses and (c) profiling, where RMW instructions are used to increment shared counters to convey meaningful statistics about a program. In each of these scenarios, the RMWs pose a bottleneck to performance and scalability. We observed that the cost of RMWs is dependent on two major factors – the memory ordering enforced by the RMW, and contention amongst processors performing RMWs to the same memory address. In the case of both synchronization and supervised memory systems, the RMWs are expensive due to the memory ordering enforced due to the atomic RMW operation. Performance overhead due to contention is more prevalent in parallel programs which frequently make use of RMWs to update concurrent data structures in a non-blocking manner. Such programs also suffer from a degradation in fairness amongst concurrent processors. In this thesis, we study the cost of RMWs in the above applications, and present solutions to obtain better performance and scalability from RMW operations. Firstly, this thesis tackles the large overhead of RMW instructions when used for synchronization in the widely used x86 processor architectures, like in Intel, AMD, and Sun processors. The x86 processor architecture implements a variation of the Total-Store-Order (TSO) memory consistency model. RMW instructions in existing TSO architectures (we call them type-1 RMW) are ordered like memory fences, which makes them expensive. The strong fence-like ordering of type-1 RMWs is unnecessary for the memory ordering required by synchronization. We propose weaker RMW instructions for TSO consistency; we consider two weaker definitions: type-2 and type-3, each causing subtle ordering differences. Type-2 and type-3 RMWs avoid the fence-like ordering of type-1 RMWs, thereby reducing their overhead. Recent work has shown that the new C/C++11 memory consistency model can be realized by generating type-1 RMWs for SC-atomic-writes and/or SC-atomic-reads. We formally prove that this is equally valid for the proposed type-2 RMWs, and partially for type-3 RMWs. We also propose efficient implementations for type-2 (type-3) RMWs. Simulation results show that our implementation reduces the cost of an RMW by up to 58.9% (64.3%), which translates into an overall performance improvement of up to 9.0% (9.2%) for the programs considered. Next, we argue the case for an efficient and correct supervised memory system for the TSO memory consistency model. Supervised memory systems make use of RMW-like supervised memory instructions (SMIs) to atomically update metadata associated with every memory address used by an application program. Such a system is used to help increase reliability, security and accuracy of parallel programs by offering debugging/monitoring features. Most existing supervised memory systems assume a sequentially consistent memory. For weaker consistency models, like TSO, correctness issues (like imprecise exceptions) arise if the ordering requirement of SMIs is neglected. In this thesis, we show that it is sufficient for supervised instructions to only read and process their metadata in order to ensure correctness. We propose SuperCoP, a supervised memory system for relaxed memory models in which SMIs read and process metadata before retirement, while allowing data and metadata writes to retire into the write-buffer. Our experimental results show that SuperCoP performs better than the existing state-of-the-art correct supervision system by 16.8%. Finally, we address the issue of contention and contention-based failure of RMWs in non-blocking synchronization mechanisms. We leverage the fact that most existing lock-free programs make use of compare-and-swap (CAS) loops to access the concurrent data structure. We propose DyFCoM (Dynamic Fairness and Contention Management), a holistic scheme which addresses both throughput and fairness under increased contention. DyFCoM monitors the number of successful and failed RMWs in each thread, and uses this information to implement a dynamic backoff scheme to optimize throughput. We also use this information to throttle faster threads and give slower threads a higher chance of performing their lock-free operations, to increase fairness among threads. Our experimental results show that our contention management scheme alone performs better than the existing state-of-the-art CAS contention management scheme by an average of 7.9%. When fairness management is included, our scheme provides an average of 3.4% performance improvement over the constant backoff scheme, while showing increased fairness values in all cases (up to 43.6%). 004
3	ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction Huang, Andrew "bunnie" 01 June 2002 (has links) The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies. AI
4	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. Martins, João Eduardo Machado Perea 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Optical interconnection networks Parallel computer architecture Photonic Redes ópticas de interconexão
5	Investigação de técnicas fotônicas de chaveamento aplicadas em arquiteturas paralelas. / Research about photonic techniques in parallel architectures. João Eduardo Machado Perea Martins 20 March 1998 (has links) Este trabalho apresenta um estudo sobre redes ópticas de interconexão aplicadas em arquiteturas paralelas, onde são propostos, simulados e analisados alguns modelos de redes. Essa é uma importante pesquisa, pois, as redes de interconexão influenciam diretamente o custo e desempenho das arquiteturas paralelas de computadores. O primeiro modelo de rede óptica proposto é chamado de SCF (Sistema Circular com Filas). Esse e um sistema sem colisões, onde há um canal exclusivo para controle de comunicação e cada nó possui um canal exclusivo para recepção de dados. Esse sistema tem um desempenho com alta taxa de vazão, alto nível de utilização e pequenas filas. Para a simulação da rede SCF foi desenvolvido um simulador dedicado, cuja adaptação para a simulação de outros modelos de redes, propostos nesse trabalho, foi facilmente realizada. Neste trabalho também foram propostos, simulados e analisados três modelos diferentes de chaves ópticas de distribuição para arquitetura paralela do tipo Dataflow. Os resultados dessas simulações mostram que componentes ópticos relativamente simples podem ser utilizados no desenvolvimento de sistemas de alto desempenho. / This work presents a study about optical interconnection network applied to parallel computer architectures, where is proposed, simulated and analyzed some models of optical interconnection networks. It is an important research because the interconnection networks influence directly the cost and performance of parallel computer architectures. The first optical interconnection network model proposed in this work is called SCF (Sistema Circular com Filas). It is a system without collisions, where there is a dedicated channel for communication control and each node has a fixed channel for data reception. The system has a performance with high throughput, high utilization leve1 and small queue size. For the SCF simulation was developed a dedicated simulator, whose adjust to simulate others optical interconnection network, proposed in this work, was easily performed. In this work also were proposed, simulated and analyzed three different models of optical distributing network for Dataflow computer architecture, whose results shows that single optical devises can ensure the development of high performance systems. Arquiteturas paralelas de computadores Fotônica Redes ópticas de interconexão Optical interconnection networks Parallel computer architecture Photonic

1

Page generated in 0.0944 seconds