Spelling suggestions: "subject:"fault tolerance"" "subject:"vault tolerance""
11 |
Test and fault-tolerance for network-on-chip infrastructuresGrecu, Cristian 05 1900 (has links)
The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures.
A fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels â from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC based multicore systems.
|
12 |
Fault injection testing of software implemented fault tolerance mechanisms of distributed systemsTao, Sha January 1996 (has links)
One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This thesis investigates the issues of testing software-implemented fault tolerance mechanisms of distributed systems through fault injection. A fault injection method has been developed. The method requires that the target software system be structured as a collection of objects interacting via messages. This enables easy insertion of fault injection objects into the target system to emulate incorrect behaviour of faulty processors by manipulating messages. This approach allows one to inject specific classes of faults while not requiring any significant changes to the target system. The method differs from the previous work in that it exploits an object oriented approach of software implementation to support the injection of specific classes of faults at the system level. The proposed fault injection method has been applied to test software-implemented reliable node systems: a TMR (triple modular redundant) node and a fail-silent node. The nodes have integrated fault tolerance mechanisms and are expected to exhibit certain behaviour in the presence of a failure. The thesis describes how various such mechanisms (for example, clock synchronisation protocol, and atomic broadcast protocol) were tested. The testing revealed flaws in implementation that had not been discovered before, thereby demonstrating the usefulness of the method. Application of the approach to other distributed systems is also described in the thesis.
|
13 |
Constructing fail-controlled nodes for distributed systems : a software approachBrasileiro, Francisco Vilar January 1995 (has links)
Designing and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.
|
14 |
A reconfiguration-based defect-tolerant design paradigm for nanotechnologiesHe, Chen, January 1900 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2006. / Vita. Includes bibliographical references.
|
15 |
Test and fault-tolerance for network-on-chip infrastructuresGrecu, Cristian 05 1900 (has links)
The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures.
A fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels â from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC based multicore systems. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
16 |
Hardware evolution : automatic design of electronic circuits in reconfigurable hardware by artificial evolutionThompson, Adrian January 1996 (has links)
No description available.
|
17 |
Reconfiguration under failure of the brushless d.c. motorMcWilliam, Charles J. January 1998 (has links)
No description available.
|
18 |
Session-aware Resource Management in Web ClusterChen, Wei-Liang 27 August 2003 (has links)
The rapidly increasing expansion and popularity of the Internet makes more and more users accept web service type. The web server with single server architecture is no longer satisfying a large number of user requests. The web cluster architecture becomes another batter solution. In our previous work, our laboratory has implemented a prototype of layer7 web switch, which provides content-aware land balancing. We also designed and implemented a management system to provide a easy way for system configuration. With the increasing of web technologies, most of web sites supply ¡§session-aware¡¨ service type. The session is that clients and servers that wish to exchange state information to place HTTP requests and responses within a lager context. In this paper, we propose a session-aware management in the web cluster. Base on our management system, we design a fault tolerance and QoS policy with session to improve performance and reliability of our web cluster system.
|
19 |
A Fault-Aware Resource Manager for Multi-Processor System-on-ChipGhaeini, Bentolhoda January 2013 (has links)
The semiconductor technology development empowers fabrication of extremelycomplex integrated circuits (ICs) that may contain billions of transistors. Suchhigh integration density enables designing an entire system onto a single chip,commonly referred to as a System-on-Chip (SoC). In order to boost performance,it is increasingly common to design SoCs that contain a number of processors, socalled multi-processor system-on-chips (MPSoCs).While on one hand, recent semiconductor technologies enable fabrication ofdevices such as MPSoCs which provide high performance, on the other hand thereis a drawback that these devices are becoming increasingly susceptible to faults.These faults may occur due to escapes from manufacturing test, aging effects orenvironmental impacts. When present in a system, faults may disrupt functionalityand can cause incorrect system operation. Therefore, it is very importantwhen designing systems to consider methods to tolerate potential faults. To copewith faults, there is a need of fault handling which implies automatic detection,identification and recovery from faults which may occur during the system’s operation.This work is about the design and implementation of a fault handling methodsfor an MPSoC. A fault aware Resource Manager (RM) is designed and implementedto obtain correct system operation and maximize the system’s throughputin the presence of faults. The RM has the responsibility of scheduling jobs to availableresources, collecting fault states from resources in the system and performingfault handling tasks, based on fault states. The RM is also employed in multipleexperiments in order to study its behavior in different situations.
|
20 |
Intra-gate fault diagnosis of CMOS integrated circuitsFan, Xinyue January 2006 (has links)
Knowing the root cause of why an Integrated Circuit (1C) device fails to function properly is the key to provide the corrective measures to increase the yield and shorten the time to market. In recent years, electrical fault diagnosis method has received growing attention due to the effective and indispensable guiding role it plays in modern fault localization practice when physical measures are more and more confined by the shrinking feature size and condensed internal structure. While most of the fault diagnosis tools are based on gate level fault models, many faults are actually at the transistor level (the intra-gate fault). This thesis provides an innovative method to diagnose the intra-gate faults. It covers a wide range of different types of intra-gate faults. The method extends the capability of gate level fault diagnosis tools to the intra-gate domain by building connections with these intra-gate faults to particular types of gate level faults. Intra-gate faults are transformed to gate level representations so that they can be diagnosed directly by the widely available and well developed gate level diagnosis tools. Real diagnosis of intra-gate faults from wafer data and physical failure analysis photos are provided as solid proofs of the effectiveness of this method.
|
Page generated in 0.0857 seconds