Spelling suggestions: "subject:"fault tolerance"" "subject:"vault tolerance""
21 |
Multiparty interactions in dependable distributed systemsZorzo, Avelino Francisco January 1999 (has links)
With the expansion of computer networks, activities involving computer communication are becoming more and more distributed. Such distribution can include processing, control, data, network management, and security. Although distribution can improve the reliability of a system by replicating components, sometimes an increase in distribution can introduce some undesirable faults. To reduce the risks of introducing, and to improve the chances of removing and tolerating faults when distributing applications, it is important that distributed systems are implemented in an organized way. As in sequential programming, complexity in distributed, in particular parallel, program development can be managed by providing appropriate programming language constructs. Language constructs can help both by supporting encapsulation so as to prevent unwanted interactions between program components and by providing higher-level abstractions that reduce programmer effort by allowing compilers to handle mundane, error-prone aspects of parallel program implementation. A language construct that supports encapsulation of interactions between multiple parties (objects or processes) is referred in the literature as multiparty interaction. In a multiparty interaction, several parties somehow "come together" to produce an intermediate and temporary combined state, use this state to execute some activity, and then leave the interaction and continue their normal execution. There has been a lot of work in the past years on multiparty interaction, but most of it has been concerned with synchronisation, or handshaking, between parties rather than the encapsulation of several activities executed in parallel by the interaction participants. The programmer is therefore left responsible for ensuring that the processes involved in a cooperative activity do not interfere with, or suffer interference from, other processes not involved in the activity. Furthermore, none of this work has discussed the provision of features that would facilitate the design of multiparty interactions that are expected to cope with faults - whether in the environment that the computer system has to deal with, in the operation of the underlying computer hardware or software, or in the design of the processes that are involved in the interaction. In this thesis the concept of multiparty interaction is integrated with the concept of exception handling in concurrent activities. The final result is a language in which the concept of multiparty interaction is extended by providing it with a mechanism to handle concurrent exceptions. This extended concept is called dependable multiparty interaction. The features and requirements for multiparty interaction and exception handling provided in a set of languages surveyed in this thesis, are integrated to describe the new dependable multiparty interaction construct. Additionally, object-oriented architectures for dependable multiparty interactions are described, and a full implementation of one of the architectures is provided. This implementation is then applied to a set of case studies. The case studies show how dependable multiparty interactions can be used to design and implement a safety-critical system, a multiparty programming abstraction, and a parallel computation model.
|
22 |
Low-cost Methods for Error Detection in Multi-core SystemsMeixner, Albert, January 2008 (has links)
Thesis (Ph. D.)--Duke University, 2008.
|
23 |
APPLICATION AWARE FOR BYZANTINE FAULT TOLERANCEChai, Hua 09 December 2014 (has links)
No description available.
|
24 |
UpRight fault toleranceClement, Allen Grogan 13 November 2012 (has links)
Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications. / text
|
25 |
An Analysis of Error Tolerance Property of Spread Spectrum SequenceDaming, Hu, Tingxian, Zhou 10 1900 (has links)
International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada / This paper proposes a problem that the error tolerance property of spread spectrum
sequence influences the performance of spread spectrum system. Then the relation is
analyzed between the error tolerance property and the correlation property of binary
sequence when correlation detection is proceeded, and the theoretical limitation of error
tolerance is given. Finally, we investigate the relationship between the determination of
the output decision threshold of correlation, the probability of correlation peak detection
and the error tolerance of the spread spectrum sequence.
|
26 |
Byzantine fault tolerant web applications using the UpRight libraryRebello, Rohan Francis 2009 August 1900 (has links)
Web applications are widely used for email, online sales, auctions, collaboration, etc. Most of today’s highly-available web applications implement fault tolerant protocols in order to tolerate crash faults. However, recent system-wide failures have been caused by arbitrary or Byzantine faults which these applications are not capable of handling. Despite the abundance of research on adding Byzantine fault tolerance (BFT) to a system, BFT systems have found little use outside the research community. Reasons typically cited for this are the difficulty in implementing such systems and the performance overhead associated with them. While most research focuses on improving the performance or lowering the replication cost of BFT protocols, little has been done on making them easy to implement.
The goal of this thesis is to evaluate the viability of BFT web applications and show that, given the right abstraction, it is viable to build a Byzantine fault tolerant web application without extensive reimplementation of the web application. In order to achieve this goal, it demonstrates a BFT implementation of the Apache Tomcat servlet container and the VQWiki web application by using the UpRight BFT library. The UpRight library provides abstractions that make it easy to develop BFT applications and we leverage
this abstraction to reduce the implementation cost of our system. Our results are encouraging — less than 2% of the original system needs to be modified while still retaining all the functionality of the original system. Given the design trade-offs that we make in implementing the system, we also get comparable performance, indicating that implementing BFT is a viable option to explore for highly-available web applications. / text
|
27 |
Fault tolerant and dynamic evolutionary optimization enginesMorales Reyes, Alicia January 2011 (has links)
Mimicking natural evolution to solve hard optimization problems has played an important role in the artificial intelligence arena. Such techniques are broadly classified as Evolutionary Algorithms (EAs) and have been investigated for around four decades during which important contributions and advances have been made. One main evolutionary technique which has been widely investigated is the Genetic Algorithm (GA). GAs are stochastic search techniques that follow the Darwinian principle of evolution. Their application in the solution of hard optimization problems has been very successful. Indeed multi-dimensional problems presenting difficult search spaces with characteristics such as multi-modality, epistasis, non regularity, deceptiveness, etc., have all been effectively tackled by GAs. In this research, a competitive form of GAs known as fine or cellular GAs (cGAs) are investigated, because of their suitability for System on Chip (SoC) implementation when tackling real-time problems. Cellular GAs have also attracted the attention of researchers due to their high performance, ease of implementation and massive parallelism. In addition, cGAs inherently possess a number of structural configuration parameters which make them capable of sustaining diversity during evolution and therefore of promoting an adequate balance between exploitative and explorative stages of the search. The fast technological development of Integrated Circuits (ICs) has allowed a considerable increase in compactness and therefore in density. As a result, it is nowadays possible to have millions of gates and transistor based circuits in very small silicon areas. Operational complexity has also significantly increased and consequently other setbacks have emerged, such as the presence of faults that commonly appear in the form of single or multiple bit flips. Tough environmental or time dependent operating conditions can trigger faults in registers and memory allocations due to induced radiation, electron migration and dielectric breakdown. These kinds of faults are known as Single Event Effects (SEEs). Research has shown that an effective way of dealing with SEEs consists of a combination of hardware and software mitigation techniques to overcome faulty scenarios. Permanent faults known as Single Hard Errors (SHEs) and temporary faults known as Single Event Upsets (SEUs) are common SEEs. This thesis aims to investigate the inherent abilities of cellular GAs to deal with SHEs and SEUs at algorithmic level. A hard real-time application is targeted: calculating the attitude parameters for navigation in vehicles using Global Positioning System (GPS) technology. Faulty critical data, which can cause a system’s functionality to fail, are evaluated. The proposed mitigation techniques show cGAs ability to deal with up to 40% stuck at zero and 30% stuck at one faults in chromosomes bits and fitness score cells. Due to the non-deterministic nature of GAs, dynamic on-the-fly algorithmic and parametric configuration has also attracted the attention of researchers. In this respect, the structural properties of cellular GAs provide a valuable attribute to influence their selection pressure. This helps to maintain an adequate exploitation-exploration tradeoff, either from a pure topological perspective or through genetic operations that also make use of structural characteristics in cGAs. These properties, unique to cGAs, are further investigated in this thesis through a set of middle to high difficulty benchmark problems. Experimental results show that the proposed dynamic techniques enhance the overall performance of cGAs in most benchmark problems. Finally, being structurally attached, the dimensionality of cellular GAs is another line of investigation. 1D and 2D structures have normally been used to test cGAs at algorithm and implementation levels. Although 3D-cGAs are an immediate extension, not enough attention has been paid to them, and so a comparative study on the dimensionality of cGAs is carried out. Having shorter radii, 3D-cGAs present a faster dissemination of solutions and have denser neighbourhoods. Empirical results reported in this thesis show that 3D-cGAs achieve better efficiency when solving multi-modal and epistatic problems. In future, the performance improvements of 3D-cGAs will merge with the latest benefits that 3D integration technology has demonstrated, such as reductions in routing length, in interconnection delays and in power consumption.
|
28 |
A Centralized Energy Management System for Wireless Sensor NetworksSkowyra, Richard William 05 May 2009 (has links)
This document presents the Centralized Energy Management System (CEMS), a dynamic fault-tolerant reclustering protocol for wireless sensor networks. CEMS reconfigures a homogeneous network both periodically and in response to critical events (e.g. cluster head death). A global TDMA schedule prevents costly retransmissions due to collision, and a genetic algorithm running on the base station computes cluster assignments in concert with a head selection algorithm. CEMS' performance is compared to the LEACH-C protocol in both normal and failure-prone conditions, with an emphasis on each protocol's ability to recover from unexpected loss of cluster heads.
|
29 |
A performance-efficient and practical processor error recovery frameworkSoman, Jyothish January 2019 (has links)
Continued reduction in the size of a transistor has affected the reliability of pro- cessors built using them. This is primarily due to factors such as inaccuracies while manufacturing, as well as non-ideal operating conditions, causing transistors to slow down consistently, eventually leading to permanent breakdown and erroneous operation of the processor. Permanent transistor breakdown, or faults, can occur at any point in time in the processor's lifetime. Errors are the discrepancies in the output of faulty circuits. This dissertation shows that the components containing faults can continue operating if the errors caused by them are within certain bounds. Further, the lifetime of a processor can be increased by adding supportive structures that start working once the processor develops these hard errors. This dissertation has three major contributions, namely REPAIR, FaultSim and PreFix. REPAIR is a fault tolerant system with minimal changes to the processor design. It uses an external Instruction Re-execution Unit (IRU) to perform operations, which the faulty processor might have erroneously executed. Instructions that are found to use faulty hardware are then re-executed on the IRU. REPAIR shows that the performance overhead of such targeted re-execution is low for a limited number of faults. FaultSim is a fast fault-simulator capable of simulating large circuits at the transistor level. It is developed in this dissertation to understand the effect of faults on different circuits. It performs digital logic based simulations, trading off analogue accuracy with speed, while still being able to support most fault models. A 32-bit addition takes under 15 micro-seconds, while simulating more than 1500 transistors. It can also be integrated into an architectural simulator, which added a performance overhead of 10 to 26 percent to a simulation. The results obtained show that single faults cause an error in an adder in less than 10 percent of the inputs. PreFix brings together the fault models created using FaultSim and the design directions found using REPAIR. PreFix performs re-execution of instructions on a remote core, which pick up instructions to execute using a global instruction buffer. Error prediction and detection are used to reduce the number of re-executed instructions. PreFix has an area overhead of 3.5 percent in the setup used, and the performance overhead is within 5 percent of a fault-free case. This dissertation shows that faults in processors can be tolerated without explicitly switching off any component, and minimal redundancy is sufficient to achieve the same.
|
30 |
Robust solutions for constraint satisfaction and optimisation under uncertainty.Hebrard, Emmanuel, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
We develop a framework for finding robust solutions of constraint programs. Our approach is based on the notion of fault tolerance. We formalise this concept within constraint programming, extend it in several dimensions and introduce some algorithms to find robust solutions efficiently. When applying constraint programming to real world problems we often face uncertainty. Whilst reactive methods merely deal with the consequences of an unexpected change, taking a more proactive approach may guarantee a certain level of robustness. We propose to apply the fault tolerance framework, introduced in [Ginsberg 98], to constraint programming: A robust solution is one such that a small perturbation only requires a small response. We identify, define and classify a number of abstract problems related to stability within constraint satisfaction or optimisation. We propose some efficient and effective algorithms for solving these problems. We then extend this framework by allowing the repairs and perturbations themselves to be constrained. Finally, we assess the practicality of this framework on constraint satisfaction and scheduling problems.
|
Page generated in 0.0628 seconds