Global ETD Search

151	System level fault diagnosis by testable diagnosis array Chu, Sung-Chi January 1987 (has links) A computer system is defined to be a system of n functional components interconnected together in a prescribed fashion to perform a variety of functions. To ensure the correctness of system output, faults in the system must be detected, and faulty component(s) subsequently identified. To continue operation with degraded performance the system reconfigures with the remaining resources. This ideal fault tolerant capability hinges on the capability of the system to detect and locate fault(s). One method is to perform a fault diagnosis procedure periodically. In the PMC (Preparata, Metze, and Chien) model, each component has the capability to test and be tested at the system level by a combination of other components in the system. Based on this model, a number of fault diagnosis algorithms have been developed. These fault diagnosis algorithms are either complicated or have inherited the diagnostic hardcore problem or both. In this thesis, an innovative approach to system level fault diagnosis is taken. A diagnosis device implemented by simple hardware is proposed that virtually eliminates the burden of complicated diagnosis algorithm and reduces the complexity of the diagnostic hardcore. This device is called a Testable Diagnosis Array, or TDA. It is constructed with very simple combinational logic cells. Its simplicity makes external testing easy. The diagnosis procedure is thus transferred to the logic of the cell and to the control structure of the TDA. In this thesis, a TDA is defined based on the PMC model. A simple algorithm is developed to construct a TDA for any given system. The TDA is characterized, with respect to the testing assignments of a system. A class of t-TDA-diagnosable systems is defined. Necessary and sufficient conditions for t-TDA-diagnosable systems are derived. A number of special classes oft-diagnosable systems are shown to be t-TDA-diagnosable. Other results and topics for future research are discussed. / Ph. D. LD5655.V856 1987.C587 Fault-tolerant computing
152	Parallel hardware accelerated switch level fault simulation Ryan, Christopher A. 02 October 2007 (has links) Switch level faults, as opposed to traditional gate level faults, can more accurately model physical faults found in an integrated circuit. However, existing fault simulation techniques have a worst-case computational complexity of O(n²), where n is the number of devices in the circuit. This paper presents a novel switch level extension to parallel fault simulation and the switch level circuit partitioning needed for parallel processing. The parallel switch level fault simulation technique uses 9-valued logic, N and P-type switch state tables, and a minimum operation in order to simulate all faults in parallel for one switch. The circuit partitioning method uses reverse level ordering, grouping, and subgrouping in order to partition transistors for parallel processing. This paper also presents an algorithm and complexity measure for parallel fault simulation as extended to the switch level. For the algorithm, the switch level fault simulation complexity is reduced to O(L²), where L is the number of levels of switches encountered when traversing from the output to the input. The complexity of the proposed algorithm is much less than that for traditional fault simulation techniques. / Ph. D. LD5655.V856 1993.R936 Electric circuits, Parallel Electric switchgear Fault-tolerant computing
153	Automated incorporation of upset detection mechanisms in distributed Ada systems Heironimus, Elisa K. January 1988 (has links) This thesis presents an automated approach to developing software that performs single event upset (SEU) detection in distributed Ada systems. Faults considered are those that fall in the single event upset (SEU) category. SEUs may cause information corruption leading to a change in program flow or causing a program to execute an infinite loop. Two techniques that detect the presence of these upsets are described. The implementation of these techniques is discussed in relation to the structure of Ada software systems and exploit the block structure of Ada. A program has been written to automatically modify Ada application software systems to contain these upset detection mechanisms. The program, Software Modifier for Upset Detection (SMUD), requires little interactive information from a programmer and relies mainly on SMUD directives that are inserted into the application software prior to the modification process. A full description of this automated procedure is included. The upset detection mechanisms have been incorporated into a distributed computer system model employing the MIL-STD-1553B communications protocol. Ada is used as the simulation environment to exercise and verify the protocol. The model used as a testbed for the upset detection mechanisms consists of two parts: the hardware model and the software implementation of the 1553B communications protocol. The hardware environment is described in detail, along with a discussion on the 1553B protocol. The detection techniques have been tested and verified at the high level using computer simulations. A testing methodology is also presented. / Master of Science LD5655.V855 1988.H447 Ada (Computer program language) Fault-tolerant computing
154	High level strategy for detection of transient faults in computer systems Modi, Nimish Harsukh January 1988 (has links) A major portion of digital system malfunctions are due to the presence of temporary faults which are either intermittent or transient. An intermittent fault manifests itself at regular intervals, while a transient fault causes a temporary change in the state of the system without damaging any of the components. Transient faults are difficult to detect and isolate and hence become a source of major concern, especially in critical real-time applications. Since satellite systems are particularly susceptible to transient faults induced by the radiation environment, a satellite communications protocol model has been developed for experimental research purposes. The model implements the MlL-TD-1553B protocol, which dictates the modes of communication between several satellite systems. The model has been developed employing the structural and behavioral capabilities of the HILO simulation system. SEUs are injected into the protocol model and the effects on the program flow are investigated. A two-tier detection scheme employing the concept of Signature Analysis is developed. Performance evaluation of the detection mechanisms is carried out and the results are presented. / Master of Science / incomplete_metadata LD5655.V855 1988.M624 Fault-tolerant computing
155	Optical sensing as a means of monitoring health of multicomputer networks Forbis, David L. 24 November 2009 (has links) The use of optical sensors to perform health monitoring in fault-tolerant multicomputers can allow the multicomputer to detect imminent failure in a particular section of the interconnection network due to damaging strain. This detection method allows the rerouting of critical data before data link failure occurs. This thesis investigates the implementation of the extrinsic Fabry-Perot interferometer into an optical hybrid communications/sensing network. A testbed of personal computers, acting as nodes of a multicomputer, are used to monitor the integrity of the network to a high degree of accuracy. When a node determines that an adjacent data link is no longer reliable due to physical damage, communications are rerouted and the node is shut down. Results of experiments with the testbed have shown that redundant nodes can be used to share computational loads, increasing the performance of the multicomputer, until network failure forces redundant nodes to assume full responsibility for computational tasks. Multicomputer performance suffers as a result of network damage, but full functionality is retained with no occurrence of errors or unknown conditions due to data link failure. / Master of Science LD5655.V855 1994.F673 Computer networks Fabry-Perot interferometers Fault-tolerant computing Optical detectors
156	The design of periodically self restoring redundant systems Singh, Adit D. January 1982 (has links) Most existing fault tolerant systems employ some form of dynamic redundancy and can be considered to be incident driven. Their recovery mechanisms are triggered by the detection of a fault. This dissertation investigates an alternative approach to fault tolerant design where the redundant system restores itself periodically to correct errors before they build up to the point of system failure. It is shown that periodically self restoring systems can be designed to be tolerant of both transient (intermittent) and permanent hardware faults. Further, the reliability of such designs is not compromised by fault latency. The periodically self restoring redundant (PSRR) systems presented in this dissertation employ, in general, N computing units (CU's) operating redundantly in synchronization. The CU's communicate with each other periodically to restore units that may have failed due to transient faults. This restoration is initiated by an interrupt from an external (fault tolerant) clocking circuit. A reliability model for such systems is developed in terms of the number of CU's in the system, their failure rates and the frequency of system restoration. Both transient and permanent faults are considered. The model allows the estimation of system reliability and mean time to failure. A restoration algorithm for implementing the periodic restoration process in PSRR systems is also presented. Finally a design procedure is described that can be used for designing PSRR systems to meet desired reliability specifications. / Ph. D. LD5655.V856 1982.S7626 Fault-tolerant computing
157	Built-in tests for a real-time embedded system. Olander, Peter Andrew. January 1991 (has links) Beneath the facade of the applications code of a well-designed real-time embedded system lies intrinsic firmware that facilitates a fast and effective means of detecting and diagnosing inevitable hardware failures. These failures can encumber the availability of a system, and, consequently, an identification of the source of the malfunction is needed. It is shown that the number of possible origins of all manner of failures is immense. As a result, fault models are contrived to encompass prevalent hardware faults. Furthermore, the complexity is reduced by determining syndromes for particular circuitry and applying test vectors at a functional block level. Testing phases and philosophies together with standardisation policies are defined to ensure the compliance of system designers to the underlying principles of evaluating system integrity. The three testing phases of power-on self tests at system start up, on-line health monitoring and off-line diagnostics are designed to ensure that the inherent test firmware remains inconspicuous during normal applications. The prominence of the code is, however, apparent on the detection or diagnosis of a hardware failure. The authenticity of the theoretical models, standardisation policies and built-in test philosophies are illustrated by means of their application to an intricate real-time system. The architecture and the software design implementing the idealogies are described extensively. Standardisation policies, enhanced by the proposition of generic tests for common core components, are advocated at all hierarchical levels. The presentation of the integration of the hardware and software are aimed at portraying the moderately complex nature of the task of generating a set of built-in tests for a real-time embedded system. In spite of generic policies, the intricacies of the architecture are found to have a direct influence on software design decisions. It is thus concluded that the diagnostic objectives of the user requirements specification be lucidly expressed by both operational and maintenance personnel for all testing phases. Disparity may exist between the system designer and the end user in the understanding of the requirements specification defining the objectives of the diagnosis. It is thus essential for complete collaboration between the two parties throughout the development life cycle, but especially during the preliminary design phase. Thereafter, the designer would be able to decide on the sophistication of the system testing capabilities. / Thesis (M.Sc.)-University of Natal, Durban, 1991. Real-time data processing. Fault-tolerant computing. Computer architecture. Embedded computer systems. Theses--Computer science.
158	Adaptive management of emerging battlefield network Fountoukidis, Dimitrios P. 03 1900 (has links) Approved for public release, distribution is unlimited / The management of the battlefield network takes place in a Network Operations Center (NOC). The manager, based on the importance of the managed network, is sometimes required to be present all the time within the physical installations of the NOC. The decisions regard a wide spectrum of network configurations, fault detection and repair, and network performance improvement. Especially in the case of the battlefield network operations these decisions are sometimes so important that can be characterized as critical to the success of the whole military operation. Most of the times, the response time is so restricted that exceeds the mean physical human response limits. An automated response that also carries the characteristics of human intelligence is needed to overcome the restrictions the human nature of an administrator imposes. The research will establish the proper computer network management architecture for an adaptive network. This architecture will enhance the capabilities of network management and in terms of cost and efficiency. / Lieutenant Commander, Hellenic Navy Computer networks Artificial intelligence Battlefields Fault-tolerant computing Adaptive network SNMP Mobile agents Artificial intelligence Collaborative agents MANTRIP project
159	Network Fault Tolerance System Sullivan, John F 01 May 2000 (has links) The world of computers experienced an explosive period of growth toward the end of the 20th century with the widespread availability of the Internet and the development of the World Wide Web. As people began using computer networks for everything from research and communication to banking and commerce, network failures became a greater concern because of the potential to interrupt critical applications. Fault tolerance systems were developed to detect and correct network failures within minutes and eventually within seconds of the failure, but time-critical applications such as military communications, video conferencing, and Web-based sales require better response time than any previous systems could provide. The goal of this thesis was the development and implementation of a Network Fault Tolerance (NFT) system that can detect and recover from failures of network interface cards, network cables, switches, and routers in much less than one second from the time of failure. The problem was divided into two parts: fault tolerance within a single local area network (LAN), and fault tolerance across many local area networks. The first part involves the network interface cards, network cables, and switches within a LAN, which the second part involves the routers that connect LANs into larger internetworks. Both parts of the NFT solution were implemented on Windows NT 4.0 PC's connected by a switched Fast Ethernet network. The NFT system was found to correct system failures within 300 milliseconds of the failure. switch network fault tolerance fault tolerance Fault-tolerant computing Local area networks (Computer networks) Computer networks Failures
160	Fault tolerant optimal control Chizeck, Howard Jay January 1982 (has links) Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING / Bibliography: leaves 898-903. / by Howard Jay Chizeck. / Sc.D. Automatic control Dynamics Mathematical optimization Fault-tolerant computing Dynamic programming Discrete-time systems

Search results