Global ETD Search

301	Persistent Fault-Tolerant Storage at the Fog Layer Bakhshi Valojerdi, Zeinab January 2021 (has links) Clouds are powerful computer centers that provide computing and storage facilities that can be remotely accessed. The flexibility and cost-efficiency offered by clouds have made them very popular for business and web applications. The use of clouds is now being extended to safety-critical applications such as factories. However, cloud services do not provide time predictability which creates a hassle for such time-sensitive applications. Moreover, delays in the data communication between clouds and the devices the clouds control are unpredictable. Therefore, to increase predictability an intermediate layer between devices and the cloud is introduced. This layer, the Fog layer, aims to provide computational resources closer to the edge of the network. However, the fog computing paradigm relies on resource-constrained nodes, creating new potential challenges in resource management, scalability, and reliability. Solutions such as lightweight virtualization technologies can be leveraged for solving the dichotomy between performance and reliability in fog computing. In this context, container-based virtualization is a key technology providing lightweight virtualization for cloud computing that can be applied in fog computing as well. Such container-based technologies provide fault tolerance mechanisms that improve the reliability and availability of application execution. By the study of a robotic use-case, we have realized that persistent data storage for stateful applications at the fog layer is particularly important. In addition, we identified the need to enhance the current container orchestration solution to fit fog applications executing in container-based architectures. In this thesis, we identify open challenges in achieving dependable fog platforms. Among these, we focus particularly on scalable, lightweight virtualization, auto-recovery, and re-integration solutions after failures in fog applications and nodes. We implement a testbed to deploy our use-case on a container-based fog platform and investigate the fulfillment of key dependability requirements. We enhance the architecture and identify the lack of persistent storage for stateful applications as an important impediment for the execution of control applications. We propose a solution for persistent fault-tolerant storage at the fog layer, which dissociates storage from applications to reduce application load and separates the concern of distributed storage. Our solution includes a replicated data structure supported by a consensus protocol that ensures distributed data consistency and fault tolerance in case of node failures. Finally, we use the UPPAAL verification tool to model and verify the fault tolerance and consistency of our solution. Dependability Fog Computing Fault-tolerance Containerization Embedded Systems Inbäddad systemteknik Computer Sciences Datavetenskap (datalogi)
302	Toward Improving the Internet of Things: Quality of Service and Fault Tolerance Perspectives Alaslani, Maha S. 13 April 2021 (has links) The Internet of Things (IoT) is a technology aimed at developing a global network of machines and devices that can interact and communicate with each other. Supporting IoT, therefore, requires revisiting the Internet’s best effort service model and reviewing its complex communication patterns. In this dissertation, we explore the unique characteristics of IoT traffic and examine IoT systems. Our work is motivated by the new capabilities offered by modern Software Defined Networks (SDN) and blockchain technology. We evaluate IoT Quality of Service (QoS) in traditional networking. We obtain mathematical expressions to calculate end-to-end delay, and dropping. Our results provide insight into the advantages of an intelligent edge serving as a detection mechanism. Subsequently, we propose SADIQ, SDN-based Application-aware Dynamic Internet of things QoS. SADIQ provides context-driven QoS for IoT applications by allowing applications to express their requirements using a high-level SQL-like policy language. Our results show that SADIQ improves the percentage of regions with an error in their reported temperature for the Weather Signal application up to 45 times; and it improves the percentage of incorrect parking statuses for regions with high occupancy for the Smart Parking application up to 30 times under the same network conditions and drop rates. Despite centralization and the control of data, IoT systems are not safe from cyber-crime, privacy issues, and security breaches. Therefore, we explore blockchain technology. In the context of IoT, Byzantine fault tolerance-based consensus protocols are used. However, the blockchain consensus layer contributes to the most remarkable performance overhead especially for IoT applications subject to maximum delay constraints. In order to capture the unique requirements of the IoT, consensus mechanisms and block formation need to be redesigned. To this end, we propose Synopsis, a novel hierarchical blockchain system. Synopsis introduces a wireless-optimized Byzantine chain replication protocol and a new probabilistic data structure. The results show that Synopsis successfully reduces the memory footprint from Megabytes to a few Kilobytes with an improvement of 1000 times. Synopsis also enables reductions in message complexity and commitment delay of 85% and 99.4%, respectively Internet of Things Quality of Service Fault Tolerance Software Defined Networks Blockchain Distributed Ledger
303	Funkční verifikace robotického systému pomocí UVM / Functional Verification of Robotic System Using UVM Krajčír, Stanislav January 2015 (has links) One of the currently most used approaches for verification of hardware systems is functional verification. This master thesis describes design and implementation of a verification environment using UVM (Universal Verification Methodology) methodology for verifying the correctness of the robot controller in order to eliminate functional errors and faults of its implementation. The theoretical part of the thesis describes the basic information about functional verification, methodologies for creating verification environments, the SystemVerilog language and fault tolerance methodologies. The next part of thesis focuses on the design of the verification environment, its implementation and the creation of tests used to verify the correctness of the robot controller. Results of verification are discussed and evaluated in the conclusion of this work.
304	Operating System Support for Redundant Multithreading Döbel, Björn 25 November 2014 (has links) Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors. info:eu-repo/classification/ddc/004 ddc:004 Fehlertoleranz, Hardware, Betriebssystem
305	Hardware-Assisted Dependable Systems Kuvaiskii, Dmitrii 22 January 2018 (has links) Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead. info:eu-repo/classification/ddc/004 ddc:004 Fehlertoleranz, Informationssicherheit Fault Tolerance, Systems Security
306	Selective Software-Implemented Hardware Fault Tolerance Techniques to Detect Soft Errors in Processors with Reduced Overhead Chielle, Eduardo 30 July 2016 (has links) Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operating with time or energy restrictions may not be able to use these techniques. For this reason, this work proposes new software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Thus, they can meet the system constraints. In addition, the shorter execution time reduces the exposure time to radiation. Consequently, the reliability is higher for the same fault coverage. Techniques can work with error correction or error detection. Once detection is less costly than correction, this work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads. Fault Tolerance Embedded Processors Soft Errors SIHFT
307	Aportaciones a la tolerancia a fallos en microprocesadores bajo efectos de la radiación Isaza-González, José 16 July 2018 (has links) El funcionamiento correcto de un sistema electrónico, aún bajo perturbaciones y fallos causados por la radiación, ha sido siempre un factor crucial en aplicaciones aeroespaciales, médicas, nucleares, de defensa, y de transporte. La tolerancia de estos sistemas, o de los componentes que los integran, a fallos de tipo Single Event Effects (SEEs), es un tema de investigación importante y una característica imprescindible de cualquier sistema utilizado, no solo en aplicaciones críticas, sino también en las aplicaciones del día a día. Por esta razón, las aplicaciones de estos sistemas requieren, cada vez más, herramientas, métricas y parámetros específicos que permitan evaluar la tolerancia a fallos; y a su vez, permitan guiar el proceso para aplicar de forma eficiente los mecanismos de protección utilizados para la mitigación de estos fallos. En este contexto, esta tesis doctoral presenta una herramienta de inyección de fallos y la metodología para la realización de campañas de inyección de fallos tipo Single Event upset (SEU) en procesadores Commercial Off-The-Shelf (COTS) y a través de plataformas de emulación/simulación. Esta herramienta aprovecha las ventajas que ofrecen las infraestructuras de depuración de hardware tales como On-Chip Debugging (OCD), y el depurador estándar de GNU (GDB) para la ejecución y depuración de los casos de estudio. También, se analiza la posibilidad de utilizar un modelo descrito en HDL (Hardware Description Language) del procesador MSP430 de Texas Instruments para estimar la fiabilidad de las aplicaciones al principio de la fase de desarrollo. Se utilizan diferentes métodos de inyección de fallos que muestran las ventajas que ofrece la emulación FPGA en comparación con las campañas de inyección llevadas a cabo en los dispositivos reales. La vulnerabilidad del banco de registros se compara y analiza por cada uno de sus registros. Por otro lado, esta memoria de tesis presenta una métrica para la aplicación eficiente del endurecimiento selectivo basada en software, que hemos llamado SHARC (Software based HARdening Criticality). Adicionalmente, también presenta un método para guiar el proceso de endurecimiento según la clasificación generada por la métrica SHARC. De esta forma, se logra proteger los recursos internos del procesador, obteniendo una cobertura máxima de fallos con los mínimos sobrecostes de protección (overheads). Esto permite diseñar sistemas confiables a bajo coste, logrando obtener un punto óptimo entre los requisitos de confiabilidad y las restricciones de diseño, evitando el uso excesivo de costosos mecanismos de protección (hardware y software). Microprocessor reliability Fault injection Soft error Radiation effects fault tolerance
308	How Failures Cascade in Software Systems Chamberlin, Barbara W. 18 April 2022 (has links) Cascading failures involve a failure in one system component that triggers failures in successive system components, potentially leading to system wide failures. While frequently used fault tolerant techniques can reduce the severity and the frequency of such failures, they continue to occur in practice. To better understand how failures cascade, we have conducted a qualitative analysis of 55 cascading failures, described in 26 publicly available incident reports. Through this analysis we have identified 16 types of cascading mechanisms (organized into eight categories) that capture the nature of the system interactions that contribute to cascading failures. We also discuss three themes based on the observation that the cascading failures we have analyzed occurred in one of three ways: a component being unable to tolerate a failure in another component, through the actions of support or automation systems as they respond to an initial failure, or during system recovery. We believe that the 16 cascading mechanisms we present and the three themes we discuss, provide important insights into some of the challenges associated with engineering a truly resilient and well-supported system. software design cascading failure graceful degradation fault tolerance graceful recovery Physical Sciences and Mathematics
309	Restoring Consistency after Network Partitions Asplund, Mikael January 2007 (has links) The software industry is facing a great challenge. While systems get more complex and distributed across the world, users are becoming more dependent on their availability. As systems increase in size and complexity so does the risk that some part will fail. Unfortunately, it has proven hard to tackle faults in distributed systems without a rigorous approach. Therefore, it is crucial that the scientific community can provide answers to how distributed computer systems can continue functioning despite faults. Our contribution in this thesis is regarding a special class of faults which occurs whennetwork links fail in such a way that parts of the network become isolated, such faults are termed network partitions. We consider the problem of how systems that have integrity constraints on data can continue operating in presence of a network partition. Such a system must act optimistically while the network is split and then perform a some kind of reconciliation to restore consistency afterwards. We have formally described four reconciliation algorithms and proven them correct. The novelty of these algorithms lies in the fact that they can restore consistency after network partitions in a system with integrity constraints and that one of the protocols allows the system to provide service during the reconciliation. We have implemented and evaluated the algorithms using simulation and as part of a partition-tolerant CORBA middleware. The results indicate that it pays oﬀ to act optimistically and that it is worthwhile to provide service during reconciliation. distributed systems fault tolerance network partitions dependability integrity constraints Computer Sciences Datavetenskap (datalogi)
310	Architecture sécurisée pour les systèmes d'information des avions du futur. / Secure architecture for information systems of future aircraft Lastera, Maxime 04 December 2012 (has links) Traditionnellement, dans le domaine avionique les logiciels utilisés à bord de l’avion sont totalement séparés des logiciels utilisés au dehors afin d’éviter toutes interaction qui pourrait corrompre les systèmes critiques à bord de l’avion. Cependant, les nouvelles générations d’avions exigent plus d’interactions avec le monde ouvert avec pour objectif de proposer des services étendu, générant ainsi un flux d’information potentiellement dangereux. Dans une précédente étude, nous avons proposé l’utilisation de la virtualisation pour assurer la sûreté de fonctionnement d’applications critiques assurant des communications bidirectionnelles entre systèmes critiques et systèmes non sûr. Dans cette thèse nous proposons deux contributions.La première contribution propose une méthode de comparaison d’hyperviseur. Nous avons développé un banc de test permettant de mesurer les performances d’un système virtualisé. Dans cette étude, différentes configurations ont été expérimentées, d’un système sans OS à une architecture complète avec un hyperviseur et un OS s’exécutant dans une machine virtuelle. Plusieurs tests (processeur, mémoire et réseaux) ont été mesurés et collectés sur différents hyperviseurs.La seconde contribution met l’accent sur l’amélioration d’une architecture de sécurité existante. Un mécanisme de comparaison basé sur l’analyse des traces d’exécution est utilisé pour détecter les anomalies entre instances d’application exécutées sur diverse machines virtuelles. Nous proposons de renforcer le mécanisme de comparaison à l’exécution par l’utilisation d’un modèle d’exécution issu d’une analyse statique du bytecode Java.Afin de valider notre approche, nous avons développé un prototype basé sur un cas d’étude identifié avec Airbus qui porte sur l’utilisation d’un ordinateur portable dédié à la maintenance / Traditionally, in avionics, on-board aircraft software used to be totally separated from open-world software in order to avoid any interaction that could corrupt critical on-board systems. However, new aircraft generations require more interaction with off-board systems to provide extended services, which makes these information flows potentially dangerous.In a previous work, we have proposed the use of virtualization to ensure dependability of critical applications despite bidirectional communication between critical on-board systems and untrusted off-board systems. In this thesis, we propose two contributions.The first contribution concerns the establishment of a benchmark of hypervisors. We have developed a test bed to assess the performance impact induced by the use of virtualization. In this work, various configurations have been experimented ranging from a basic machine without an OS up to the complete architecture featuring a hypervisor and an OS running in a virtual machine. Several tests (computation, memory, and network) are carried out, and timing measures are collected on different hypervisors.The second contribution focuses on the improvement of an existing security architecture. A comparison mechanism based on the analysis of execution traces is used to detect discrepancies between replicas supported by diverse virtual machines. We proposeto strengthen the comparison mechanism at runtime by the use of an execution model, derived from a static analysis of the java bytecode Sûreté de fonctionnement Virtualisation Avionique Logiciel critique Dependability Fault tolerance Avionics Critical systems Virtualization 004

Search results