Global ETD Search

11	Fault tolerance : a new method to detect fault in computing systems Mugwar, Bader 03 June 2011 (has links) This paper discusses the detection of Fault Tolerance in computers. It outlines the present techniques available, namely, Anderson's and Avizienis: The writer introduces a new method based on Anderson's detection technique; this modified version turns out to be a more foolproof system. Since the shortcomings of both the 'old' techniques are discussed in detail the writer also suggests how to overcome them using the technique that he had proposed. To prove the excellence of his method, the writer applies his technique to the SIFT system to show that it is workable and superior to previous ones. Diagrams are provided for clarification.Ball State UniversityMuncie, IN 47306 Computers -- Reliability. Computer engineering. Ball State University. Thesis (M.S.)
12	A unified theory of system-level diagnosis and its application to regular interconnected structures / Somani, Arun K. (Arun Kumar) January 1985 (has links) System-level diagnosis is considered to be a viable alternative to circuit-level testing in complex multiprocessor systems. The characterization problem, the diagnosability problem, and the diagnosis problem in this framework have been widely studied in the literature with respect to a special fault class, called t-fault class, in which all fault sets of size up to t are considered. Various models for the interpretation of test outcomes have been proposed and analyzed. Among these, four most known models are: symmetric invalidation model, asymmetric invalidation model, symmetric invalidation model with intermittent faults, and asymmetric invalidation model with intermittent faults. / In this thesis, a completely new generalization of the characterization problem in system-level diagnosis area is developed. This generalized characterization theorem provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable under all the four models. Moreover, the following three results are obtained for the t-fault class: (1) the characterization theorem for t-diagnosable systems under the asymmetric invalidation model with intermittent faults is developed for the first time; (2) a unified t-characterization theorem covering all the four models is presented; and finally (3) it is proven that the classical t-characterization theorems under the first three models and the new result for the fourth model, as mentioned in (1) above, are special cases of the generalized characterization theorem. / The general diagnosability problem is also studied. It is shown that the single fault diagnosability problem, under the asymmetric invalidation model is Co-NP-complete. / As regards the diagnosis problem, most of the diagnosis algorithms developed thus far are global algorithms in which a complete syndrome is analyzed by a single supervisory processor. In this thesis, distributed diagnosis algorithms for regular interconnected structures are developed which take advantage of the interconnection architecture of a multiprocessor system. Multiprocessors. Fault-tolerant computing.
13	System management of a redundant clocking network Manush, Charles Edward. January 1976 (has links) Thesis: M.S., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 1976 / Bibliography: p.110. / by Charles E. Manush, III. / M.S. / M.S. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics Aeronautics and Astronautics Synchronization. Debugging in computer science. Multiprocessors. Computers Reliability.
14	A unified theory of system-level diagnosis and its application to regular interconnected structures / Somani, Arun K. (Arun Kumar) January 1985 (has links) No description available. Fault-tolerant computing. Multiprocessors.
15	Testing and fault detection in a Fault-Tolerant Multiprocessor Mantz, Michael Roy January 1981 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 1981. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND AERO / Bibliography: leaves B1-B6. / by Michael Roy Mantz. / M.S. Aeronautics and Astronautics. Command and control systems Flight control Fault-tolerant computing Electronic digital computers Reliability Multiprocessors Testing
16	Naming and synchronization in a decentralized computer system. Reed, David Patrick, 1952- January 1979 (has links) Thesis. 1979. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Bibliography: leaves 212-216. / Ph.D. Electronic digital computers Reliability Computer networks
17	Architectural Support For Improving System Hardware/software Reliability Dimitrov, Martin 01 January 2010 (has links) It is a great challenge to build reliable computer systems with unreliable hardware and buggy software. On one hand, software bugs account for as much as 40% of system failures and incur high cost, an estimate of $59.5B a year, on the US economy. On the other hand, under the current trends of technology scaling, transient faults (also known as soft errors) in the underlying hardware are predicted to grow at least in proportion to the number of devices being integrated, which further exacerbates the problem of system reliability. We propose several methods to improve system reliability both in terms of detecting and correcting soft-errors as well as facilitating software debugging. In our first approach, we detect instruction-level anomalies during program execution. The anomalies can be used to detect and repair soft-errors, or can be reported to the programmer to aid software debugging. In our second approach, we improve anomaly detection for software debugging by detecting different types of anomalies as well as by removing false-positives. While the anomalies reported by our first two methods are helpful in debugging single-threaded programs, they do not address concurrency bugs in multi-threaded programs. In our third approach, we propose a new debugging primitive which exposes the non-deterministic behavior of parallel programs and facilitates the debugging process. Our idea is to generate a time-ordered trace of events such as function calls/returns and memory accesses in different threads. In our experience, exposing the time-ordered event information to the programmer is highly beneficial for reasoning about the root causes of concurrency bugs. Computer architecture Computers -- Reliability Debugging in computer science Electrical and Computer Engineering Electrical and Electronics Engineering
18	High level strategy for detection of transient faults in computer systems Modi, Nimish Harsukh January 1988 (has links) A major portion of digital system malfunctions are due to the presence of temporary faults which are either intermittent or transient. An intermittent fault manifests itself at regular intervals, while a transient fault causes a temporary change in the state of the system without damaging any of the components. Transient faults are difficult to detect and isolate and hence become a source of major concern, especially in critical real-time applications. Since satellite systems are particularly susceptible to transient faults induced by the radiation environment, a satellite communications protocol model has been developed for experimental research purposes. The model implements the MlL-TD-1553B protocol, which dictates the modes of communication between several satellite systems. The model has been developed employing the structural and behavioral capabilities of the HILO simulation system. SEUs are injected into the protocol model and the effects on the program flow are investigated. A two-tier detection scheme employing the concept of Signature Analysis is developed. Performance evaluation of the detection mechanisms is carried out and the results are presented. / Master of Science / incomplete_metadata LD5655.V855 1988.M624 Fault-tolerant computing
19	The design of periodically self restoring redundant systems Singh, Adit D. January 1982 (has links) Most existing fault tolerant systems employ some form of dynamic redundancy and can be considered to be incident driven. Their recovery mechanisms are triggered by the detection of a fault. This dissertation investigates an alternative approach to fault tolerant design where the redundant system restores itself periodically to correct errors before they build up to the point of system failure. It is shown that periodically self restoring systems can be designed to be tolerant of both transient (intermittent) and permanent hardware faults. Further, the reliability of such designs is not compromised by fault latency. The periodically self restoring redundant (PSRR) systems presented in this dissertation employ, in general, N computing units (CU's) operating redundantly in synchronization. The CU's communicate with each other periodically to restore units that may have failed due to transient faults. This restoration is initiated by an interrupt from an external (fault tolerant) clocking circuit. A reliability model for such systems is developed in terms of the number of CU's in the system, their failure rates and the frequency of system restoration. Both transient and permanent faults are considered. The model allows the estimation of system reliability and mean time to failure. A restoration algorithm for implementing the periodic restoration process in PSRR systems is also presented. Finally a design procedure is described that can be used for designing PSRR systems to meet desired reliability specifications. / Ph. D. LD5655.V856 1982.S7626 Fault-tolerant computing
20	Built-in tests for a real-time embedded system. Olander, Peter Andrew. January 1991 (has links) Beneath the facade of the applications code of a well-designed real-time embedded system lies intrinsic firmware that facilitates a fast and effective means of detecting and diagnosing inevitable hardware failures. These failures can encumber the availability of a system, and, consequently, an identification of the source of the malfunction is needed. It is shown that the number of possible origins of all manner of failures is immense. As a result, fault models are contrived to encompass prevalent hardware faults. Furthermore, the complexity is reduced by determining syndromes for particular circuitry and applying test vectors at a functional block level. Testing phases and philosophies together with standardisation policies are defined to ensure the compliance of system designers to the underlying principles of evaluating system integrity. The three testing phases of power-on self tests at system start up, on-line health monitoring and off-line diagnostics are designed to ensure that the inherent test firmware remains inconspicuous during normal applications. The prominence of the code is, however, apparent on the detection or diagnosis of a hardware failure. The authenticity of the theoretical models, standardisation policies and built-in test philosophies are illustrated by means of their application to an intricate real-time system. The architecture and the software design implementing the idealogies are described extensively. Standardisation policies, enhanced by the proposition of generic tests for common core components, are advocated at all hierarchical levels. The presentation of the integration of the hardware and software are aimed at portraying the moderately complex nature of the task of generating a set of built-in tests for a real-time embedded system. In spite of generic policies, the intricacies of the architecture are found to have a direct influence on software design decisions. It is thus concluded that the diagnostic objectives of the user requirements specification be lucidly expressed by both operational and maintenance personnel for all testing phases. Disparity may exist between the system designer and the end user in the understanding of the requirements specification defining the objectives of the diagnosis. It is thus essential for complete collaboration between the two parties throughout the development life cycle, but especially during the preliminary design phase. Thereafter, the designer would be able to decide on the sophistication of the system testing capabilities. / Thesis (M.Sc.)-University of Natal, Durban, 1991. Real-time data processing. Fault-tolerant computing. Computer architecture. Embedded computer systems. Theses--Computer science.

Search results