• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 4
  • Tagged with
  • 28
  • 28
  • 28
  • 18
  • 18
  • 14
  • 13
  • 10
  • 9
  • 9
  • 7
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Broadcast Mechanism for improving Conditional Branch Prediction in Speculative Multithreaded Processors

Thankappan Achary Retnamma, Renjith 01 January 2010 (has links)
ABSTRACT Many aspects of speculative multithreading have been under constant and crucial research in the recent times with the increased importance in exploiting parallelism in single thread applications. One of the important architectural optimizations that is very pertinent in this scenario is branch prediction. Branch Prediction assumes increased importance for multi-threading systems that execute threads speculatively, since wrong predictions can be much costlier here, in terms of threads, than a few instructions that occupy the pipeline in a uni-processor. Conventional branch prediction techniques have provided increasingly better prediction accuracies for uni-core processing. But the branch prediction itself takes on a whole new dimension when applied to multi-core architectures based on Speculative Multithreading. Dependence on global branch history has helped branch predictors to achieve high prediction accuracy in single thread applications. The discontinuity of global history created at the thread boundaries cripple the performance of branch predictors in a multi-threaded environment. Many studies in the past have tried to address the branch history problem to improve the prediction accuracy. Most of these have been found either to be architecture specific or complex in terms of the hardware needed to recreate or approximate the right history to be given to the threads when they start executing out of order. This hardware overhead increases as the number and size of threads increase thereby limiting the scalability of the algorithms proposed so far. The current thesis takes a different direction and proposes a simple and scalable solution to effectively reduce the misprediction rates in Speculative Multithreaded systems. This is accomplished by making use of a synergistic interaction between threads to boost the inherent biased nature of branches and using less complex hardware to reduce aliasing between branches in the threads. The study proposes a new scheme called the Global Broadcast Buffer scheme to effectively reduce branch mispredictions in Speculative Multithreaded architectures.
22

Adaptive transaction scheduling for transactional memory systems

Yoo, Richard M. 01 April 2008 (has links)
Transactional memory systems are expected to enable parallel programming at lower programming complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, there are certain situations where transactional memory systems could actually perform worse. Transactional memory systems can outperform locks only when the executing workloads contain sufficient parallelism. When the workload lacks inherent parallelism, launching excessive transactions can adversely degrade performance. These situations will actually become dominant in future workloads when large-scale transactions are frequently executed. In this thesis, we propose a new paradigm called adaptive transaction scheduling to address this issue. Based on the parallelism feedback from applications, our adaptive transaction scheduler dynamically dispatches and controls the number of concurrently executing transactions. In our case study, we show that our low-cost mechanism not only guarantees that hardware transactional memory systems perform no worse than a single global lock, but also significantly improves performance for both hardware and software transactional memory systems.
23

Design and evaluation of a technology-scalable architecture for instruction-level parallelism

Nagarajan, Ramadass, January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
24

VCluster a portable virtual computing library for cluster computing /

Zhang, Hua. January 2008 (has links)
Thesis (Ph.D.)--University of Central Florida, 2008. / Advisers: Ratan K. Guha, Joohan Lee. Includes bibliographical references (p. 132-143).
25

Investigating tools and techniques for improving software performance on multiprocessor computer systems

Tristram, Waide Barrington January 2012 (has links)
The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured.
26

A Sparse Learning Approach for Linux Kernel Data Race Prediction

Ryan, Gabriel January 2023 (has links)
Operating system kernels rely on fine-grained concurrency to achieve optimal performance on modern multi-core processors. However, heavy usage of fine-grained concurrency mechanisms make modern operating system kernels prone to data races, which can cause severe and often elusive bugs. In this thesis, I propose a new approach to identifying data races in OS Kernels based on learning a model to predict which memory accesses can be feasibly executed concurrently with one another. To develop an efficient learning method for memory access feasibility, I develop a novel approach based on encoding feasibility as a boolean indicator function of system calls and ordered memory accesses. A memory access feasibility function encoded this way will have a naturally sparse latent representation due to the sparsity of interthread communications and synchronization interactions, and can therefore be accurately approximated based on a small number of observed concurrent execution traces. This thesis introduces two key contributions. First, Probabilistic Lockset Analysis (PLA), is a new analysis that exploits sparsity in input dependencies in conjunction with a conservative lockset analysis to efficiently predict data races in the Linux OS Kernel. Second, approximate happens-before analysis in the fourier domain (HBFourier) generalizes the approach used by PLA to reason about interthread memory communications and synchronization events through sparse fourier learning. In addition to being theoretically grounded, these techniques are highly practical: they find hundreds of races in a recent Linux development kernel, an order of magnitude improvement over prior work, and find races with severe security impacts that have been overlooked by existing kernel testing systems for years.
27

Design and evaluation of a technology-scalable architecture for instruction-level parallelism

Nagarajan, Ramadass, 1977- 28 August 2008 (has links)
Not available
28

Dynamic Task Prediction for an SpMT Architecture Based on Control Independence

Jothi, Komal 01 January 2009 (has links)
Exploiting better performance from computer programs translates to finding more instructions to execute in parallel. Since most general purpose programs are written in an imperatively sequential manner, closely lying instructions are always data dependent, making the designer look far ahead into the program for parallelism. This necessitates wider superscalar processors with larger instruction windows. But superscalars suffer from three key limitations, their inability to scale, sequential fetch bottleneck and high branch misprediction penalty. Recent studies indicate that current superscalars have reached the end of the road and designers will have to look for newer ideas to build computer processors. Speculative Multithreading (SpMT) is one of the most recent techniques to exploit parallelism from applications. Most SpMT architectures partition a sequential program into multiple threads (or tasks) that can be concurrently executed on multiple processing units. It is desirable that these tasks are sufficiently distant from each other so as to facilitate parallelism. It is also desirable that these tasks are control independent of each other so that execution of a future task is guaranteed in case of local control flow misspeculations. Some task prediction mechanisms rely on the compiler requiring recompilation of programs. Current dynamic mechanisms either rely on program constructs like loop iterations and function and loop boundaries, resulting in unbalanced loads, or predict tasks which are too short to be of use in an SpMT architecture. This thesis is the first proposal of a predictor that dynamically predicts control independent tasks that are consistently wide apart, and executes them on a novel SpMT architecture.

Page generated in 0.0684 seconds