• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 245
  • 130
  • 54
  • 23
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 557
  • 557
  • 156
  • 128
  • 116
  • 102
  • 92
  • 84
  • 74
  • 72
  • 71
  • 63
  • 57
  • 56
  • 56
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Supporting fault-tolerant communication in networks

Kanjani, Khushboo 15 May 2009 (has links)
We address two problems dealing with fault-tolerant communication in networks. The first one is designing a distributed storage protocol tolerant to Byzantine failure of servers. The protocol implements a multi-writer multi-reader register which satisfies a weaker consistency condition called MWReg. Most of the earlier work gives multiwriter implementations by simulating m copies of a single-writer protocol where m is the number of writers. Our solution gives a direct multi-writer implementation and thus has bounded message and time complexity independent of the number of writers. We have simulated the complete protocol to test its performance and also proved its correctness theoretically. The second problem we address is of providing a reliable communication link between two nodes in a network. We present a capacity reservation algorithm in the case for upper bounds on edge capacities and costs associated with using per unit capacity on any edge. We give a flow based approximation algorithm with cost at most four times optimal. To conclude, we design a distributed storage protocol and a capacity reservation algorithm which are tolerant to network failures.
2

Test and fault-tolerance for network-on-chip infrastructures

Grecu, Cristian 05 1900 (has links)
The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures. A fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels – from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC based multicore systems.
3

Test and fault-tolerance for network-on-chip infrastructures

Grecu, Cristian 05 1900 (has links)
The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures. A fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels – from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC based multicore systems.
4

Replicating multithreaded services

Kapritsos, Emmanouil 09 February 2015 (has links)
For the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms. / text
5

A reconfiguration-based defect-tolerant design paradigm for nanotechnologies

He, Chen 28 August 2008 (has links)
Not available / text
6

Reliable mobile agents for distributed computing

Wagealla, Waleed January 2003 (has links)
The emergence of platform-independent, mobile code technologies has created big opportunities for Internet-based applications. Mobile agents are being utilized to perform a variety of tasks from personalized computing to business-critical transactions. Unfortunately, these advances were not matched by correspondent research into the reliability of these new technologies. This work has been undertaken to investigate the faulttolerance of this new paradigm. Agent programs' mobility and autonomy of execution has introduced a new class of failures different to that of traditional distributed systems. Therefore, fault tolerance is one of the main problems that must be resolved to improve the adoption of an agents' paradigm. The investigation of mobile agents reliability in this thesis resulted in the development of REMA (REliable Mobile Agents), which guarantees the reliable execution, migration, and communication of mobile agents in the presence of faults that might affect the agents hosts or their communication network. We introduced an algorithm for the transparent detection of faults that might affect agent execution, migration, and communication. A decentralized structure was used to divide the agent dynamic distributed system into network-partitioning proof spaces. Lightweight messaging was adopted as the basic error detection engine, which together with the loosely coupled detection managers provided an efficient, low overhead detection mechanism for agent-based distributed processing. The problem of taking checkpoint of agent execution is hampered by the lack of the accessibility of the underlying structure of the JVM. Thus, an alternative solution has been achieved through the REMA Checkpoint and Recovery (REMA-CR) package. REMA-CR provides the developer with powerful classes and methods that allow for capturing the critical data of agents' execution. The developed recovery protocol offers a communication-pairs, independent checkpointing strategy at a low-cost, that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly once execution property. The results and the performance of REMA confirmed our objectives of providing a fault tolerant wrapper for agents and their applications with acceptable overhead cost.
7

Fault injection testing of software implemented fault tolerance mechanisms of distributed systems

Tao, Sha January 1996 (has links)
One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This thesis investigates the issues of testing software-implemented fault tolerance mechanisms of distributed systems through fault injection. A fault injection method has been developed. The method requires that the target software system be structured as a collection of objects interacting via messages. This enables easy insertion of fault injection objects into the target system to emulate incorrect behaviour of faulty processors by manipulating messages. This approach allows one to inject specific classes of faults while not requiring any significant changes to the target system. The method differs from the previous work in that it exploits an object oriented approach of software implementation to support the injection of specific classes of faults at the system level. The proposed fault injection method has been applied to test software-implemented reliable node systems: a TMR (triple modular redundant) node and a fail-silent node. The nodes have integrated fault tolerance mechanisms and are expected to exhibit certain behaviour in the presence of a failure. The thesis describes how various such mechanisms (for example, clock synchronisation protocol, and atomic broadcast protocol) were tested. The testing revealed flaws in implementation that had not been discovered before, thereby demonstrating the usefulness of the method. Application of the approach to other distributed systems is also described in the thesis.
8

Constructing fail-controlled nodes for distributed systems : a software approach

Brasileiro, Francisco Vilar January 1995 (has links)
Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.
9

The analysis and synthesis of dependable digital systems

Philp, Kenneth January 1996 (has links)
No description available.
10

Smart distributed systems

Perez, Hector Benitez January 1999 (has links)
No description available.

Page generated in 0.0685 seconds