• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 7
  • 6
  • 4
  • 3
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 92
  • 38
  • 27
  • 26
  • 18
  • 18
  • 17
  • 13
  • 10
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Replicating multithreaded services

Kapritsos, Emmanouil 09 February 2015 (has links)
For the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms. / text

Runtime data race detection in multi-threaded programs methods and tools

Mühlenfeld, Arndt January 1900 (has links)
Zugl.: Graz, Techn. Univ., Diss., 2007 / Hergesteelt on demand

Dependency speculation in dynamic simultaneous multi-threading /

Nelson, Jarrod A. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2006. / Printout. Includes bibliographical references (leaves 29-30). Also available on the World Wide Web.

XTHREAD : a flexible concurrency analysis framework

Ressia, Jorge Luis. January 2006 (has links)
Many different methodologies have been developed for analyzing multithreaded programs. These analyses present a wide variety of approaches and tend to be rather complicated because they work on applications formed by several threads executed in a nondeterministic order. / To address these issues this thesis introduces XThread, a flexible and modular framework for developing different concurrency analyses over multithreaded applications. The main objective of XTHREAD is to reduce the complexity of developing concurrency analyses by providing high level abstractions that close the breach between the language spoken by the researcher and the language the framework provides. Moreover, this framework provides different tools that are often required for solving issues common to many concurrency analyses. XTHREAD's modular organization also delivers a flexible environment for developing and testing different analysis implementations. / In order to demonstrate the usefulness of the framework a client analysis representing known but non-trivial multithreaded analysis is developed which is composed of several other concurrency analysis. A substantial number of benchmarks are used in order to test the implementations, showing that complex programs are accepted and correctly handled by the abstractions provided by the framework. Using the XTHREAD framework we demonstrate implementations that have both comparable accuracy and much better generality than is typically found in existing, research-level implementations of concurrency analyses.

An analysis of software interface issues for SMT processors /

Redstone, Joshua Abram. January 2002 (has links)
Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 116-124).


Zhang, Hua 05 September 2008 (has links)
Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries. / Ph.D. / School of Electrical Engineering and Computer Science / Engineering and Computer Science / Computer Science PhD

An asymmetric multi-core architecture for efficiently accelerating critical paths in multithreaded programs

Suleman, Muhammad Aater 20 October 2010 (has links)
Extracting high-performance from Chip Multiprocessors (CMPs) requires that the application be parallelized i.e., divided into threads which execute concurrently on multiple cores. To save programmer effort, difficult to parallelize program portions are often left as serial. We show that common serial portions, i.e., non-parallel kernels, critical sections, and limiter stages in a pipeline, become the critical path of the program when the number of cores increases, thereby limiting performance and scalability. We propose that instead of burdening the software programmers with the task of shortening the serial portions, we can accelerate the serial portions using hardware support. To this end, we propose the Asymmetric Chip-Multiprocessor (ACMP) paradigm which provides one (or few) fast core(s) for accelerated execution of the serial portions and multiple slow, small cores for high throughput on the parallel portions. We show a concrete example implementation of the ACMP which consists of one large, high-performance core and many small, power-efficient cores. We develop hardware/software mechanisms to accelerate the execution of serial portions using the ACMP, and further improve the ACMP by proposing mechanisms to tackle common overheads incurred by the ACMP. / text

Analysis of the effectiveness of multithreading for interrupts on communication processors

Pattery, Vinu J. 01 May 2003 (has links)
High bandwidth of networks demands high performance communication processors that integrate application processing, network processing, and system support functions into a single, low cost System-On-Chip (SOC) solution. However, conventional processors, when used in network related applications, are beset by the overhead of save/restore of register context, cache misses due to fetching interrupt handler from memory, and the possibility of NIC buffer overflow. Therefore, this paper analyzes the effectiveness of multithreading to service interrupts on an embedded processor from the perspective of a Network processor and a Communication processor. A Simulation environment enhanced with a multithreaded hardware execution model is used and our results reveal that multithreading for interrupts from a single NIC brings a fair improvement in performance of Network processors and little or no effect on Communication processors. However, our analysis also show that multithreading for interrupts has a lot of potential when applied to communication processors with multiple interrupt sources, such as Ethernet, ATM, USB, and HDLC. Index terms: Multithreading, UDP, IP, device driver, interrupt processing, communication processor. / Graduation date: 2003

A Coupled Multi-ALU Processing Node for a Highly Parallel Computer

Keckler, Stephen W. 01 September 1992 (has links)
This report describes Processor Coupling, a mechanism for controlling multiple ALUs on a single integrated circuit to exploit both instruction-level and inter-thread parallelism. A compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle byscycle basis, and several threads can be active concurrently. Simulation results show that Processor Coupling performs well both on single threaded and multi-threaded applications. The experiments address the effects of memory latencies, function unit latencies, and communication bandwidth between function units.

Design and Evaluation of the Hamal Parallel Computer

Grossman, J.P. 05 December 2002 (has links)
Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.

Page generated in 0.0409 seconds