Global ETD Search

11	Fault injection testing of software implemented fault tolerance mechanisms of distributed systems Tao, Sha January 1996 (has links) One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This thesis investigates the issues of testing software-implemented fault tolerance mechanisms of distributed systems through fault injection. A fault injection method has been developed. The method requires that the target software system be structured as a collection of objects interacting via messages. This enables easy insertion of fault injection objects into the target system to emulate incorrect behaviour of faulty processors by manipulating messages. This approach allows one to inject specific classes of faults while not requiring any significant changes to the target system. The method differs from the previous work in that it exploits an object oriented approach of software implementation to support the injection of specific classes of faults at the system level. The proposed fault injection method has been applied to test software-implemented reliable node systems: a TMR (triple modular redundant) node and a fail-silent node. The nodes have integrated fault tolerance mechanisms and are expected to exhibit certain behaviour in the presence of a failure. The thesis describes how various such mechanisms (for example, clock synchronisation protocol, and atomic broadcast protocol) were tested. The testing revealed flaws in implementation that had not been discovered before, thereby demonstrating the usefulness of the method. Application of the approach to other distributed systems is also described in the thesis. 005 Software fault tolerance
12	Constructing fail-controlled nodes for distributed systems : a software approach Brasileiro, Francisco Vilar January 1995 (has links) Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation. 005 Fault tolerance
13	A reconfiguration-based defect-tolerant design paradigm for nanotechnologies He, Chen, January 1900 (has links) (PDF) Thesis (Ph. D.)--University of Texas at Austin, 2006. / Vita. Includes bibliographical references.
14	Test and fault-tolerance for network-on-chip infrastructures Grecu, Cristian 05 1900 (has links) The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures. A fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels â from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC based multicore systems. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Network-on-chip Fault tolerance
15	Hardware evolution : automatic design of electronic circuits in reconfigurable hardware by artificial evolution Thompson, Adrian January 1996 (has links) No description available. 620.0042029 Fault tolerance; Computation
16	Reconfiguration under failure of the brushless d.c. motor McWilliam, Charles J. January 1998 (has links) No description available. 621.31042 Synchronous; Fault tolerance; Simulation
17	Intra-gate fault diagnosis of CMOS integrated circuits Fan, Xinyue January 2006 (has links) Knowing the root cause of why an Integrated Circuit (1C) device fails to function properly is the key to provide the corrective measures to increase the yield and shorten the time to market. In recent years, electrical fault diagnosis method has received growing attention due to the effective and indispensable guiding role it plays in modern fault localization practice when physical measures are more and more confined by the shrinking feature size and condensed internal structure. While most of the fault diagnosis tools are based on gate level fault models, many faults are actually at the transistor level (the intra-gate fault). This thesis provides an innovative method to diagnose the intra-gate faults. It covers a wide range of different types of intra-gate faults. The method extends the capability of gate level fault diagnosis tools to the intra-gate domain by building connections with these intra-gate faults to particular types of gate level faults. Intra-gate faults are transformed to gate level representations so that they can be diagnosed directly by the widely available and well developed gate level diagnosis tools. Real diagnosis of intra-gate faults from wafer data and physical failure analysis photos are provided as solid proofs of the effectiveness of this method. 621.395 Integrated circuits : Fault tolerance
18	Multiparty interactions in dependable distributed systems Zorzo, Avelino Francisco January 1999 (has links) With the expansion of computer networks, activities involving computer communication are becoming more and more distributed. Such distribution can include processing, control, data, network management, and security. Although distribution can improve the reliability of a system by replicating components, sometimes an increase in distribution can introduce some undesirable faults. To reduce the risks of introducing, and to improve the chances of removing and tolerating faults when distributing applications, it is important that distributed systems are implemented in an organized way. As in sequential programming, complexity in distributed, in particular parallel, program development can be managed by providing appropriate programming language constructs. Language constructs can help both by supporting encapsulation so as to prevent unwanted interactions between program components and by providing higher-level abstractions that reduce programmer effort by allowing compilers to handle mundane, error-prone aspects of parallel program implementation. A language construct that supports encapsulation of interactions between multiple parties (objects or processes) is referred in the literature as multiparty interaction. In a multiparty interaction, several parties somehow "come together" to produce an intermediate and temporary combined state, use this state to execute some activity, and then leave the interaction and continue their normal execution. There has been a lot of work in the past years on multiparty interaction, but most of it has been concerned with synchronisation, or handshaking, between parties rather than the encapsulation of several activities executed in parallel by the interaction participants. The programmer is therefore left responsible for ensuring that the processes involved in a cooperative activity do not interfere with, or suffer interference from, other processes not involved in the activity. Furthermore, none of this work has discussed the provision of features that would facilitate the design of multiparty interactions that are expected to cope with faults - whether in the environment that the computer system has to deal with, in the operation of the underlying computer hardware or software, or in the design of the processes that are involved in the interaction. In this thesis the concept of multiparty interaction is integrated with the concept of exception handling in concurrent activities. The final result is a language in which the concept of multiparty interaction is extended by providing it with a mechanism to handle concurrent exceptions. This extended concept is called dependable multiparty interaction. The features and requirements for multiparty interaction and exception handling provided in a set of languages surveyed in this thesis, are integrated to describe the new dependable multiparty interaction construct. Additionally, object-oriented architectures for dependable multiparty interactions are described, and a full implementation of one of the architectures is provided. This implementation is then applied to a set of case studies. The case studies show how dependable multiparty interactions can be used to design and implement a safety-critical system, a multiparty programming abstraction, and a parallel computation model. 005 Fault tolerance; Production cell
19	Low-cost Methods for Error Detection in Multi-core Systems Meixner, Albert, January 2008 (has links) Thesis (Ph. D.)--Duke University, 2008.
20	APPLICATION AWARE FOR BYZANTINE FAULT TOLERANCE Chai, Hua 09 December 2014 (has links) No description available. Computer Engineering Byzantine fault tolerance

Search results