Spelling suggestions: "subject:"fault tolerance"" "subject:"vault tolerance""
81 |
Immunity-based detection, identification, and evaluation of aircraft sub-system failuresMoncayo, Hever Y. January 2009 (has links)
Thesis (Ph. D.)--West Virginia University, 2009. / Title from document title page. Document formatted into pages; contains xiv, 118 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 109-118).
|
82 |
A new approach to detecting failures in distributed systemsLeners, Joshua Blaise 18 September 2015 (has links)
Fault-tolerant distributed systems often handle failures in two steps: first, detect the failure and, second, take some recovery action. A common approach to detecting failures is end-to-end timeouts, but using timeouts brings problems. First, timeouts are inaccurate: just because a process is unresponsive does not mean that process has failed. Second, choosing a timeout is hard: short timeouts can exacerbate the problem of inaccuracy, and long timeouts can make the system wait unnecessarily. In fact, a good timeout value—one that balances the choice between accuracy and speed—may not even exist, owing to the variance in a system’s end-to-end delays. ƃis dissertation posits a new approach to detecting failures in distributed systems: use information about failures that is local to each component, e.g., the contents of an OS’s process table. We call such information inside information, and use it as the basis in the design and implementation of three failure reporting services for data center applications, which we call Falcon, Albatross, and Pigeon. Falcon deploys a network of software modules to gather inside information in the system, and it guarantees that it never reports a working process as crashed by sometimes terminating unresponsive components. ƃis choice helps applications by making reports of failure reliable, meaning that applications can treat them as ground truth. Unfortunately, Falcon cannot handle network failures because guaranteeing that a process has crashed requires network communication; we address this problem in Albatross and Pigeon. Instead of killing, Albatross blocks suspected processes from using the network, allowing applications to make progress during network partitions. Pigeon renounces interference altogether, and reports inside information to applications directly and with more detail to help applications make better recovery decisions. By using these services, applications can improve their recovery from failures both quantitatively and qualitatively. Quantitatively, these services reduce detection time by one to two orders of magnitude over the end-to-end timeouts commonly used by data center applications, thereby reducing the unavailability caused by failures. Qualitatively, these services provide more specific information about failures, which can reduce the logic required for recovery and can help applications better decide when recovery is not necessary.
|
83 |
Exploring Application-level Fault Tolerance for Robust Design Using FPGAChen, Jing Unknown Date
No description available.
|
84 |
A Fault-tolerant Strategy for Embedded-memory SoC OFDM ReceiversSmolyakov, Vadim 27 November 2013 (has links)
The International Technology Roadmap for Semiconductors projects that embedded memories will occupy increasing System-on-Chip area. The growing density of integration increases the likelihood of fabrication faults. The proposed memory repair strategy employs forward error correction at the system level and mitigates the impact of memory faults through permutation of high sensitivity regions. The effectiveness of the proposed repair technique is demonstrated on a 19.4-Mbit de-interleaver SRAM memory of an ISDB-T digital baseband OFDM receiver in 65-nm CMOS. The proposed technique introduces a single multiplexer delay overhead and a configurable area overhead of M/i bits, where M is the number of memory rows and i is an integer from 1 to M, inclusive. The proposed strategy achieves a measured 0.15 dB gain
improvement at a 2e-4 Quasi-Error-Free (QEF) BER in the presence of memory faults for an AWGN channel.
|
85 |
A Fault-tolerant Strategy for Embedded-memory SoC OFDM ReceiversSmolyakov, Vadim 27 November 2013 (has links)
The International Technology Roadmap for Semiconductors projects that embedded memories will occupy increasing System-on-Chip area. The growing density of integration increases the likelihood of fabrication faults. The proposed memory repair strategy employs forward error correction at the system level and mitigates the impact of memory faults through permutation of high sensitivity regions. The effectiveness of the proposed repair technique is demonstrated on a 19.4-Mbit de-interleaver SRAM memory of an ISDB-T digital baseband OFDM receiver in 65-nm CMOS. The proposed technique introduces a single multiplexer delay overhead and a configurable area overhead of M/i bits, where M is the number of memory rows and i is an integer from 1 to M, inclusive. The proposed strategy achieves a measured 0.15 dB gain
improvement at a 2e-4 Quasi-Error-Free (QEF) BER in the presence of memory faults for an AWGN channel.
|
86 |
Reliability and fault tolerance modelling of multiprocessor systemsValdivia, Roberto Abraham January 1989 (has links)
Reliability evaluation by analytic modelling constitute an important issue of designing a reliable multiprocessor system. In this thesis, a model for reliability and fault tolerance analysis of the interconnection network is presented, based on graph theory. Reliability and fault tolerance are considered as deterministic and probabilistic measures of connectivity. Exact techniques for reliability evaluation fail for large multiprocessor systems because of the enormous computational resources required. Therefore, approximation techniques have to be used. Three approaches are proposed, the first by simplifying the symbolic expression of reliability; the other two by applying a hierarchical decomposition to the system. All these methods give results close to those obtained by exact techniques.
|
87 |
Scalable and robust compute capacity multiplexing in virtualized datacentersKesavan, Mukil 27 August 2014 (has links)
Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand.
At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales.
To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters.
Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments.
|
88 |
Design and simulation of advanced fault tolerant flight control schemesGururajan, Srikanth. January 2006 (has links)
Thesis (Ph. D.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains xii, 132 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 123-128).
|
89 |
Survival Techniques for Computer ProgramsRinard, Martin C. 01 1900 (has links)
Programs developed with standard techniques often fail when they encounter any of a variety of internal errors. We present a set of techniques that prevent programs from failing and instead enable them to continue to execute even after they encounter otherwise fatal internal errors. Our results indicate that even though the techniques may take the program outside of its anticipated execution envelope, the continued execution often enables the program to provide acceptable results to their users. These techniques may therefore play an important role in making software systems more resilient and reliable in the face or errors. / Singapore-MIT Alliance (SMA)
|
90 |
LCL DC/DC converter and DC hub under DC faults and development of DC grids with protection system using DC hubZhang, Jianxi January 2016 (has links)
In this thesis, an IGBT-based DC/DC converter employing an internal inductor-capacitor-inductor (LCL) passive circuit is investigated in DC grid under fault conditions. It is concluded that a range of converter parameters exist which will give DC fault current magnitudes close to rated currents. Steady state and transient fault responses are investigated in depth. The converter is modelled on PSCAD platform under fault operation and the simulation results verify the analytical studies. LCL DC hub is an extension of DC/DC converter to multiple ports with capability of limiting the propagation of DC faults in a DC grid. Analytical mathematical equations for steady state fault currents are derived. A state space model of the hub is introduced for transient fault study. The hub is able to interconnect multiple DC cables at different voltage levels and act as DC substation for DC grid. The designed hub also has the ability to maintain the current within the order of its rated value without additional protection even for the worst case fault. The analytical study results are confirmed by detailed simulation on PSCAD. Based on the good performance of the LCL DC hub under DC faults, a DC grid topology with protection system employing LCL DC hub is proposed and investigated in this thesis. The advantage and feasibility of this method in DC fault protection is investigated based on the developed grid model. The DC grid protection systems are proposed and analysed in depth under several DC fault scenarios. The PSCAD simulation results under a range of DC fault scenarios on various locations are shown. These results confirm significance of the proposed DC grid protection system and advantages of this proposed topology in fault isolation.
|
Page generated in 0.0775 seconds