Global ETD Search

21	Broadcast Mechanism for improving Conditional Branch Prediction in Speculative Multithreaded Processors Thankappan Achary Retnamma, Renjith 01 January 2010 (has links) ABSTRACT Many aspects of speculative multithreading have been under constant and crucial research in the recent times with the increased importance in exploiting parallelism in single thread applications. One of the important architectural optimizations that is very pertinent in this scenario is branch prediction. Branch Prediction assumes increased importance for multi-threading systems that execute threads speculatively, since wrong predictions can be much costlier here, in terms of threads, than a few instructions that occupy the pipeline in a uni-processor. Conventional branch prediction techniques have provided increasingly better prediction accuracies for uni-core processing. But the branch prediction itself takes on a whole new dimension when applied to multi-core architectures based on Speculative Multithreading. Dependence on global branch history has helped branch predictors to achieve high prediction accuracy in single thread applications. The discontinuity of global history created at the thread boundaries cripple the performance of branch predictors in a multi-threaded environment. Many studies in the past have tried to address the branch history problem to improve the prediction accuracy. Most of these have been found either to be architecture specific or complex in terms of the hardware needed to recreate or approximate the right history to be given to the threads when they start executing out of order. This hardware overhead increases as the number and size of threads increase thereby limiting the scalability of the algorithms proposed so far. The current thesis takes a different direction and proposes a simple and scalable solution to effectively reduce the misprediction rates in Speculative Multithreaded systems. This is accomplished by making use of a synergistic interaction between threads to boost the inherent biased nature of branches and using less complex hardware to reduce aliasing between branches in the threads. The study proposes a new scheme called the Global Broadcast Buffer scheme to effectively reduce branch mispredictions in Speculative Multithreaded architectures. Threads (Computer programs) Pipeline Computers
22	Adaptive transaction scheduling for transactional memory systems Yoo, Richard M. 01 April 2008 (has links) Transactional memory systems are expected to enable parallel programming at lower programming complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, there are certain situations where transactional memory systems could actually perform worse. Transactional memory systems can outperform locks only when the executing workloads contain sufficient parallelism. When the workload lacks inherent parallelism, launching excessive transactions can adversely degrade performance. These situations will actually become dominant in future workloads when large-scale transactions are frequently executed. In this thesis, we propose a new paradigm called adaptive transaction scheduling to address this issue. Based on the parallelism feedback from applications, our adaptive transaction scheduler dynamically dispatches and controls the number of concurrently executing transactions. In our case study, we show that our low-cost mechanism not only guarantees that hardware transactional memory systems perform no worse than a single global lock, but also significantly improves performance for both hardware and software transactional memory systems. Parallelism Performance Transaction effectiveness Contention intensity Transaction systems (Computer systems) Threads (Computer programs) Parallel programming (Computer science) Synchronization
23	Design and evaluation of a technology-scalable architecture for instruction-level parallelism Nagarajan, Ramadass, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
24	VCluster a portable virtual computing library for cluster computing / Zhang, Hua. January 2008 (has links) Thesis (Ph.D.)--University of Central Florida, 2008. / Advisers: Ratan K. Guha, Joohan Lee. Includes bibliographical references (p. 132-143).
25	Investigating tools and techniques for improving software performance on multiprocessor computer systems Tristram, Waide Barrington January 2012 (has links) The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured. Multiprocessors Multiprogramming (Electronic computers) Parallel programming (Computer science) Linux Abstract data types (Computer science) Threads (Computer programs) Computer programming
26	A Sparse Learning Approach for Linux Kernel Data Race Prediction Ryan, Gabriel January 2023 (has links) Operating system kernels rely on fine-grained concurrency to achieve optimal performance on modern multi-core processors. However, heavy usage of fine-grained concurrency mechanisms make modern operating system kernels prone to data races, which can cause severe and often elusive bugs. In this thesis, I propose a new approach to identifying data races in OS Kernels based on learning a model to predict which memory accesses can be feasibly executed concurrently with one another. To develop an efficient learning method for memory access feasibility, I develop a novel approach based on encoding feasibility as a boolean indicator function of system calls and ordered memory accesses. A memory access feasibility function encoded this way will have a naturally sparse latent representation due to the sparsity of interthread communications and synchronization interactions, and can therefore be accurately approximated based on a small number of observed concurrent execution traces. This thesis introduces two key contributions. First, Probabilistic Lockset Analysis (PLA), is a new analysis that exploits sparsity in input dependencies in conjunction with a conservative lockset analysis to efficiently predict data races in the Linux OS Kernel. Second, approximate happens-before analysis in the fourier domain (HBFourier) generalizes the approach used by PLA to reason about interthread memory communications and synchronization events through sparse fourier learning. In addition to being theoretically grounded, these techniques are highly practical: they find hundreds of races in a recent Linux development kernel, an order of magnitude improvement over prior work, and find races with severe security impacts that have been overlooked by existing kernel testing systems for years. Computer science Operating systems (Computers) Fourier analysis Threads (Computer programs) Machine learning Linux
27	Design and evaluation of a technology-scalable architecture for instruction-level parallelism Nagarajan, Ramadass, 1977- 28 August 2008 (has links) Not available Computer architecture--Design Computer architecture--Evaluation High performance processors--Evaluation Threads (Computer programs)
28	Dynamic Task Prediction for an SpMT Architecture Based on Control Independence Jothi, Komal 01 January 2009 (has links) Exploiting better performance from computer programs translates to finding more instructions to execute in parallel. Since most general purpose programs are written in an imperatively sequential manner, closely lying instructions are always data dependent, making the designer look far ahead into the program for parallelism. This necessitates wider superscalar processors with larger instruction windows. But superscalars suffer from three key limitations, their inability to scale, sequential fetch bottleneck and high branch misprediction penalty. Recent studies indicate that current superscalars have reached the end of the road and designers will have to look for newer ideas to build computer processors. Speculative Multithreading (SpMT) is one of the most recent techniques to exploit parallelism from applications. Most SpMT architectures partition a sequential program into multiple threads (or tasks) that can be concurrently executed on multiple processing units. It is desirable that these tasks are sufficiently distant from each other so as to facilitate parallelism. It is also desirable that these tasks are control independent of each other so that execution of a future task is guaranteed in case of local control flow misspeculations. Some task prediction mechanisms rely on the compiler requiring recompilation of programs. Current dynamic mechanisms either rely on program constructs like loop iterations and function and loop boundaries, resulting in unbalanced loads, or predict tasks which are too short to be of use in an SpMT architecture. This thesis is the first proposal of a predictor that dynamically predicts control independent tasks that are consistently wide apart, and executes them on a novel SpMT architecture. Computer architecture Parallel programming (Computer science) Microprocessors -- Programming Simultaneous multithreading processors Threads (Computer programs) Computer and Systems Architecture Electrical and Computer Engineering

Search results