Spelling suggestions: "subject:"[een] FAULT TOLERANCE"" "subject:"[enn] FAULT TOLERANCE""
21 |
An Analysis of Error Tolerance Property of Spread Spectrum SequenceDaming, Hu, Tingxian, Zhou 10 1900 (has links)
International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada / This paper proposes a problem that the error tolerance property of spread spectrum
sequence influences the performance of spread spectrum system. Then the relation is
analyzed between the error tolerance property and the correlation property of binary
sequence when correlation detection is proceeded, and the theoretical limitation of error
tolerance is given. Finally, we investigate the relationship between the determination of
the output decision threshold of correlation, the probability of correlation peak detection
and the error tolerance of the spread spectrum sequence.
|
22 |
Fault tolerant and dynamic evolutionary optimization enginesMorales Reyes, Alicia January 2011 (has links)
Mimicking natural evolution to solve hard optimization problems has played an important role in the artificial intelligence arena. Such techniques are broadly classified as Evolutionary Algorithms (EAs) and have been investigated for around four decades during which important contributions and advances have been made. One main evolutionary technique which has been widely investigated is the Genetic Algorithm (GA). GAs are stochastic search techniques that follow the Darwinian principle of evolution. Their application in the solution of hard optimization problems has been very successful. Indeed multi-dimensional problems presenting difficult search spaces with characteristics such as multi-modality, epistasis, non regularity, deceptiveness, etc., have all been effectively tackled by GAs. In this research, a competitive form of GAs known as fine or cellular GAs (cGAs) are investigated, because of their suitability for System on Chip (SoC) implementation when tackling real-time problems. Cellular GAs have also attracted the attention of researchers due to their high performance, ease of implementation and massive parallelism. In addition, cGAs inherently possess a number of structural configuration parameters which make them capable of sustaining diversity during evolution and therefore of promoting an adequate balance between exploitative and explorative stages of the search. The fast technological development of Integrated Circuits (ICs) has allowed a considerable increase in compactness and therefore in density. As a result, it is nowadays possible to have millions of gates and transistor based circuits in very small silicon areas. Operational complexity has also significantly increased and consequently other setbacks have emerged, such as the presence of faults that commonly appear in the form of single or multiple bit flips. Tough environmental or time dependent operating conditions can trigger faults in registers and memory allocations due to induced radiation, electron migration and dielectric breakdown. These kinds of faults are known as Single Event Effects (SEEs). Research has shown that an effective way of dealing with SEEs consists of a combination of hardware and software mitigation techniques to overcome faulty scenarios. Permanent faults known as Single Hard Errors (SHEs) and temporary faults known as Single Event Upsets (SEUs) are common SEEs. This thesis aims to investigate the inherent abilities of cellular GAs to deal with SHEs and SEUs at algorithmic level. A hard real-time application is targeted: calculating the attitude parameters for navigation in vehicles using Global Positioning System (GPS) technology. Faulty critical data, which can cause a system’s functionality to fail, are evaluated. The proposed mitigation techniques show cGAs ability to deal with up to 40% stuck at zero and 30% stuck at one faults in chromosomes bits and fitness score cells. Due to the non-deterministic nature of GAs, dynamic on-the-fly algorithmic and parametric configuration has also attracted the attention of researchers. In this respect, the structural properties of cellular GAs provide a valuable attribute to influence their selection pressure. This helps to maintain an adequate exploitation-exploration tradeoff, either from a pure topological perspective or through genetic operations that also make use of structural characteristics in cGAs. These properties, unique to cGAs, are further investigated in this thesis through a set of middle to high difficulty benchmark problems. Experimental results show that the proposed dynamic techniques enhance the overall performance of cGAs in most benchmark problems. Finally, being structurally attached, the dimensionality of cellular GAs is another line of investigation. 1D and 2D structures have normally been used to test cGAs at algorithm and implementation levels. Although 3D-cGAs are an immediate extension, not enough attention has been paid to them, and so a comparative study on the dimensionality of cGAs is carried out. Having shorter radii, 3D-cGAs present a faster dissemination of solutions and have denser neighbourhoods. Empirical results reported in this thesis show that 3D-cGAs achieve better efficiency when solving multi-modal and epistatic problems. In future, the performance improvements of 3D-cGAs will merge with the latest benefits that 3D integration technology has demonstrated, such as reductions in routing length, in interconnection delays and in power consumption.
|
23 |
Robust solutions for constraint satisfaction and optimisation under uncertainty.Hebrard, Emmanuel, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
We develop a framework for finding robust solutions of constraint programs. Our approach is based on the notion of fault tolerance. We formalise this concept within constraint programming, extend it in several dimensions and introduce some algorithms to find robust solutions efficiently. When applying constraint programming to real world problems we often face uncertainty. Whilst reactive methods merely deal with the consequences of an unexpected change, taking a more proactive approach may guarantee a certain level of robustness. We propose to apply the fault tolerance framework, introduced in [Ginsberg 98], to constraint programming: A robust solution is one such that a small perturbation only requires a small response. We identify, define and classify a number of abstract problems related to stability within constraint satisfaction or optimisation. We propose some efficient and effective algorithms for solving these problems. We then extend this framework by allowing the repairs and perturbations themselves to be constrained. Finally, we assess the practicality of this framework on constraint satisfaction and scheduling problems.
|
24 |
Otherworld - Giving Applications a Chance to Survive OS Kernel CrashesDepoutovitch, Alexandre 06 January 2012 (has links)
The default behavior of all commodity operating systems today is to restart the system when a critical error is encountered in the kernel. This terminates all running applications with an attendant loss of "work in progress" that is non-persistent. Our thesis is that an operating system kernel is simply a component of a larger software system, which is logically well isolated from other components, such as applications, and therefore it should be possible to reboot the kernel without terminating everything else running on the same system.
In order to prove this thesis, we designed and implemented a new mechanism, called Otherworld, that microreboots the operating system kernel when a critical error is encountered in the kernel, and it does so without clobbering the state of the running applications. After the kernel microreboot, Otherworld attempts to resurrect the applications that were running at the time of failure. It does so by restoring the application memory spaces, open files and other resources. In the default case it then continues executing the processes from the point at which they were interrupted by the failure. Optionally, applications can have user-level recovery procedures registered with the kernel, in which case Otherworld passes control to these procedures after having restored their process state. Recovery procedures might check the integrity of application data and restore resources Otherworld was not able to restore.
We implemented Otherworld in Linux, but we believe that the technique can be applied to all commodity operating systems. In an extensive set of experiments on real-world applications (MySQL, Apache/PHP, Joe, vi), we show that Otherworld is capable of successfully microrebooting the kernel and restoring the applications in over 97\% of the cases. In the default case, Otherworld adds negligible overhead to normal execution. In an enhanced mode, Otherworld can provide extra application memory protection with overhead of between 4% and 12%.
|
25 |
Otherworld - Giving Applications a Chance to Survive OS Kernel CrashesDepoutovitch, Alexandre 06 January 2012 (has links)
The default behavior of all commodity operating systems today is to restart the system when a critical error is encountered in the kernel. This terminates all running applications with an attendant loss of "work in progress" that is non-persistent. Our thesis is that an operating system kernel is simply a component of a larger software system, which is logically well isolated from other components, such as applications, and therefore it should be possible to reboot the kernel without terminating everything else running on the same system.
In order to prove this thesis, we designed and implemented a new mechanism, called Otherworld, that microreboots the operating system kernel when a critical error is encountered in the kernel, and it does so without clobbering the state of the running applications. After the kernel microreboot, Otherworld attempts to resurrect the applications that were running at the time of failure. It does so by restoring the application memory spaces, open files and other resources. In the default case it then continues executing the processes from the point at which they were interrupted by the failure. Optionally, applications can have user-level recovery procedures registered with the kernel, in which case Otherworld passes control to these procedures after having restored their process state. Recovery procedures might check the integrity of application data and restore resources Otherworld was not able to restore.
We implemented Otherworld in Linux, but we believe that the technique can be applied to all commodity operating systems. In an extensive set of experiments on real-world applications (MySQL, Apache/PHP, Joe, vi), we show that Otherworld is capable of successfully microrebooting the kernel and restoring the applications in over 97\% of the cases. In the default case, Otherworld adds negligible overhead to normal execution. In an enhanced mode, Otherworld can provide extra application memory protection with overhead of between 4% and 12%.
|
26 |
RADIC: a powerful fault-tolerant architectureAmancio Duarte, Angelo 28 June 2007 (has links)
La tolerancia a fallos se ha convertido en un requerimiento importante para los ingenieros informáticos y los desarrolladores de software, debido a que la ocurrencia de fallos aumenta el coste de explotación de un computador paralelo. Por otro lado, las actividades realizadas por el mecanismo de tolerancia de fallo reducen las prestaciones del sistema desde el punto de vista del usuario. Esta tesis presenta una arquitectura tolerante a fallos para computadores paralelos, denominada RADIC (Redundant Array of Distributed Fault Tolerance Controllers,), que es simultáneamente transparente, descentralizada, flexible y escalable. RADIC es una arquitectura tolerante a fallos que se basa un controlador distribuido para manejar los fallos. Dicho controlador se basa en procesos dedicados, que comparten los recursos del usuario en el computador paralelo. Para validar el funcionamiento de la arquitectura RADIC, se realizó una implementación que sigue el estándar MPI-1 y que contiene los elementos de la arquitectura. Dicha implementación, denominada RADICMPI, permite verificar la funcionalidad de RADIC en situaciones sin fallo o bajo condiciones de fallo. Las pruebas se han realizado utilizando un inyector de fallos, involucrado en el código de RADICMPI, de manera que permite todas las condiciones necesarias para validar la operación del controlador distribuido de RADIC. También se utilizó la misma implementación para estudiar las consecuencias de usar RADIC en un ambiente real. Esto permitió evaluar la operación de la arquitectura en situaciones prácticas, y estudiar la influencia de los parámetros de RADIC sobre el funcionamiento del sistema. Los resultados probaron que la arquitectura de RADIC funciona correctamente y que es flexible, escalable, transparente y descentralizada. Además, RADIC estableció una arquitectura de tolerancia a fallos para sistemas basados en paso de mensajes. / Fault tolerance has become a major issue for computer engineers and software developers because the occurrence of faults increases the cost of using a parallel computer. On the other hand, the activities performed by the fault tolerance mechanism reduce the performance of the system from the user point of view. This thesis presents RADIC (Redundant Array of Distributed Independent Fault Tolerance Controllers,) a fault-tolerant architecture to parallel computers, which is simultaneously transparent, decentralized, flexible and scalable. RADIC is a fault-tolerant architecture that implements a fully distributed controller to manage faults. Such controller rests on dedicated processes, which share the user's resources in the parallel computer. In order to validate the operation of RADIC, we created RADICMPI, a message-passing implementation that includes the elements of the RADIC architecture and complies with the MPI-1 standard. RADICMPI served for to verifying the functionality of RADIC in scenarios with and without failures in the parallel computer. For the tests, we implemented a fault injector in RADICMPI in order to create the scenarios required to validate the operation of the RADIC distributed controller. We also used RADICMPI to study the practical aspects of using RADIC in a real environment. This allowed us to evaluate the operation of our architecture in practical situations, and to study the influence of the RADIC parameters over the system performance. The results proved that the RADIC architecture operated correctly and that it is flexible, scalable, transparent and decentralized. Furthermore, RADIC established a powerful fault-tolerant architecture model for message-passing systems.
|
27 |
Multipath Fault-tolerant Routing Policies to deal with Dynamic Link Failures in High Speed Interconnection NetworksZarza, Gonzalo Alberto 08 July 2011 (has links)
Les xarxes d'interconnexió tenen com un dels seus objectius principals comunicar i enllaçar els nodes de processament dels sistemes de còmput d'altes prestacions. En aquest context, les fallades de xarxa tenen un impacte considerablement alt, ja que la majoria dels algorismes d'encaminament no van ser dissenyats per tolerar aquestes anomalies. A causa d'això, fins i tot una única fallada d'enllaç té la capacitat d'embussar missatges a la xarxa, provocant situacions de bloqueig o, encara pitjor, és capaç d'impedir la correcta finalització de les aplicacions que es trobin en execució en el sistema de còmput.
En aquesta tesi presentem polítiques d'encaminament tolerants a fallades basades en els conceptes d'adaptabilitat i evitació de bloquejos, dissenyades per a xarxes afectades per un gran nombre de fallades d'enllaços. Es presenten dues contribucions al llarg de la tesi, a saber: un mètode d'encaminament tolerant a fallades multicamí, i una tècnica nova i escalable d'evitació de bloquejos.
La primera de les contribucions de la tesi és un algorisme d'encaminament adaptatiu multicamí, anomenat Fault-tolerant Distributed Routing Balancing (FT-DRB), que permet explotar la redundància de camins de comunicació de les topologies de xarxa actuals, a fi de proveir tolerància a fallades a les xarxes d'interconnexió. La segona contribució de la tesi és la tècnica escalable d'evitació de bloquejos Non-blocking Adaptive Cycles (NAC). Aquesta tècnica va ser específicament dissenyada per funcionar en xarxes d'interconnexió que presentin un gran nombre de fallades d'enllaços. Aquesta tècnica va ser dissenyada i implementada amb la finalitat de servir al mètode d'encaminament descrit anteriorment, FT-DRB. / Las redes de interconexión tienen como uno de sus objetivos principales comunicar y enlazar los nodos de procesamiento de los sistemas de cómputo de altas prestaciones. En este contexto, los fallos de red tienen un impacto considerablemente alto, ya que la mayoría de los algoritmos de encaminamiento no fueron diseñados para tolerar dichas anomalías. Debido a esto, incluso un único fallo de en un enlace tiene la capacidad de atascar mensajes en la red, provocando situaciones de bloqueo o, peor aún, es capaz de impedir la correcta finalización de las aplicaciones que se encuentren en ejecución en el sistema de cómputo.
En esta tesis presentamos políticas de encaminamiento tolerantes a fallos basadas en los conceptos de adaptabilidad y evitación de bloqueos, diseñadas para redes de comunicación afectadas por un gran número de fallos de enlaces. Se presentan dos contribuciones a lo largo de la tesis, a saber: un método de encaminamiento tolerante a fallos multicamino, y una novedosa y escalable técnica de evitación de bloqueos.
La primera de las contribuciones de la tesis es un algoritmo de encaminamiento adaptativo multicamino, denominado Fault-tolerant Distributed Routing Balancing (FT-DRB), que permite explotar la redundancia de caminos de comunicación de las topologías de red actuales, a fin de proveer tolerancia a fallos a las redes de interconexión. La segunda contribución de la tesis es la técnica escalable de evitación de bloqueos Non-blocking Adaptive Cycles (NAC). Dicha técnica fue específicamente diseñada para funcionar en redes de interconexión que presenten un gran número de fallos de enlaces. Esta técnica fue diseñada e implementada con la finalidad de servir al método de encaminamiento descrito anteriormente, FT-DRB. / Interconnection networks communicate and link together the processing units of modern high-performance computing systems. In this context, network faults have an extremely high impact since most routing algorithms have not been designed to tolerate faults. Because of this, as few as one single link failure may stall messages in the network, leading to deadlock configurations or, even worse, prevent the finalization of applications running on computing systems.
In this thesis we present fault-tolerant routing policies based on concepts of adaptability and deadlock freedom, capable of serving interconnection networks affected by a large number of link failures. Two contributions are presented throughout this thesis, namely: a multipath fault-tolerant routing method, and a novel and scalable deadlock avoidance technique.
The first contribution of this thesis is the adaptive multipath routing method Fault-tolerant Distributed Routing Balancing (FT-DRB). This method has been designed to exploit the communication path redundancy available in many network topologies, allowing interconnection networks to perform in the presence of a large number of faults. The second contribution is the scalable deadlock avoidance technique Non-blocking Adaptive Cycles (NAC), specifically designed for interconnection networks suffering from a large number of failures. This technique has been designed and implemented with the aim of ensuring freedom from deadlocks in the proposed fault-tolerant routing method FT-DRB.
|
28 |
A Skeleton Supporting Group Collaboration, Load Distribution, and Fault Tolerance for Internet-based ComputingChiang, Chuanwen 13 August 2001 (has links)
This dissertation is intended to explore the design of a dual connection skeleton (DCS), which facilitates effective and efficient exploitation of Internet-centric collaborative workgroup and high performance metacomputing applications. The predominant difference between DCS and conventional frameworks is that DCS administers a network of brokers that are grouped into a logical ring. New mechanisms for group collaboration, load distribution, and fault tolerance, which are three crucial issues in Internet-based computing, are proposed and integrated into the dual connection skeleton.
Collaborative workgroup becomes a significant common issue when we attempt to develop wide area applications supporting computer-supported cooperative work (CSCW). For group collaboration, DCS therefore offers a strategy for concurrency control that ensures the consistency of shared resources. By using the strategy, multiple users in a collaborative group are able to simultaneously access shared data without violating its consistency. With respect to load distribution, additionally, DCS applies an adaptive highest response ratio next (AHRRN) algorithm to job scheduling. Performance evaluations on competing algorithms, such as shortest job first (SJF), highest response ratio next (HRRN), and first come, first served (FCFS) are conducted. Simulation results demonstrate that AHRRN is not only an efficient algorithm, but also is able to prevent the well-known job starvation problem. In a parallel computational application, one can further decompose a composite job into constituent tasks such that these tasks can be assigned to different PEs for concurrent execution. The dual connection skeleton thus makes use of a proposed dynamic grouping scheduling (DGS), to undertake task scheduling for performance improvement. The DGS algorithm employs a task grouping strategy to determine computational costs of tasks. It re-prioritizes unscheduled tasks at each scheduling step to explore an appropriate task allocation decision. In terms of the schedule length, the performance of DGS has been evaluated by comparing with some existing algorithms, such as Heavy Node First (HNF), Critical Path Method (CPM), Weight Length (WL), Dynamic Level Scheduling (DLS), and Dynamic Priority Scheduling (DPS). Simulation results show that DGS outperforms these competing algorithms. Moreover, as for fault tolerance, DCS utilizes a dual connection mechanism for computational reliability enhancement. For the sake of constructing dual connection, we examine five approaches: RANDOM, NEXT, ROTARY, MINNUM, and WEIGHT. Each one of these approaches can be incorporated into DCS-based wide-area metacomputing systems. Performance simulation shows that WEIGHT benefits the dual connection the most. A DCS-based scientific computational application named the motion correction is used to demonstrate the fault tolerant ability of DCS. Putting the group collaboration, load distribution, and fault tolerance issues together, the dual connection skeleton forms a seamless and integrated framework for Internet-centric computing.
|
29 |
On strong fault tolerance (or strong Menger-connectivity) of multicomputer networksOh, Eunseuk 15 November 2004 (has links)
As the size of networks increases continuously, dealing with networks with faulty nodes becomes unavoidable. In this dissertation, we introduce a new measure for network fault tolerance, the strong fault tolerance (or strong Menger-connectivity)in multicomputer networks, and study the strong fault tolerance for popular multicomputer network structures. Let G be a network in which all nodes have degree d. We say that G is strongly fault tolerant if it has the following property: Let Gf be a copy of G with at most d - 2 faulty nodes. Then for any pair of non-faulty nodes u and v in Gf , there are min{degf (u), degf (v)} node-disjoint paths in Gf from u to v, where degf (u) and degf (v) are the degrees of the nodes u and v in Gf, respectively.
First we study the strong fault tolerance for the popular network structures such as star networks and hypercube networks. We show that the star networks and the hypercube networks are strongly fault tolerant and develop efficient algorithms that construct the maximum number of node-disjoint paths of nearly optimal or optimal
length in these networks when they contain faulty nodes. Our algorithms are optimal in terms of their time complexity. In addition to studying the strong fault tolerance, we also investigate a more realistic concept to describe the ability of networks for tolerating faults. The traditional definition of fault tolerance, sustaining at most d - 1faulty nodes for a regular graph G of degree d, reflects a very rare situation. In many cases, there is a chance
that a routing path between two given nodes can be constructed though the network may have more faulty nodes than its degree. In this dissertation, we study the fault tolerance of hypercube networks under a probability model. When each node of the n-dimensional hypercube network has an independent failure probability p, we develop algorithms that, with very high probability, can construct a fault-free path
when the hypercube network can sustain up to 2np faulty nodes.
|
30 |
Failure recovery in redundant serial manipulators /Cocca, Christopher David, January 2000 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2000. / Vita. Includes bibliographical references (leaves 214-223). Available also in a digital version from Dissertation Abstracts.
|
Page generated in 0.0587 seconds