Global ETD Search

41	Hardware-Assisted Dependable Systems Kuvaiskii, Dmitrii 22 January 2018 (has links) Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead. info:eu-repo/classification/ddc/004 ddc:004 Fehlertoleranz, Informationssicherheit Fault Tolerance, Systems Security
42	A Contribution to the Design of Highly Redundant Compliant Aerial Manipulation Systems Yao, Chao 05 October 2022 (has links) Es ist vorhersehbar, dass die Luftmanipulatoren in den nächsten Jahrzehnten für viele Aufgaben eingesetzt werden, die entweder zu gefährlich oder zu teuer sind, um sie mit herkömmlichen Methoden zu bewältigen. In dieser Arbeit wird eine neuartige Lösung für die Gesamtsteuerung von hochredundanten Luftmanipulationssystemen vorgestellt. Die Ergebnisse werden auf eine Referenzkonfiguration angewendet, die als universelle Plattform für die Durchführung verschiedener Luftmanipulationsaufgaben etabliert wird. Diese Plattform besteht aus einer omnidirektionalen Drohne und einem seriellen Manipulator. Um den modularen Regelungsentwurf zu gewährleisten, werden zwei rechnerisch effiziente Algorithmen untersucht, um den virtuellen Eingang den Aktuatorbefehlen zuzuordnen. Durch die Integration eines auf einem künstlichen neuronalen Netz basierenden Diagnosemoduls und der rekonfigurierbaren Steuerungszuordnung in den Regelkreis, wird die Fehlertoleranz für die Drohne erzielt. Außerdem wird die Motorsättigung durch Rekonfiguration der Geschwindigkeits- und Beschleunigungsprofile behandelt. Für die Beobachtung der externen Kräfte und Drehmomente werden zwei Filter vorgestellt. Dies ist notwendig, um ein nachgiebiges Verhalten des Endeffektors durch die achsenselektive Impedanzregelung zu erreichen. Unter Ausnutzung der Redundanz des vorgestellten Luftmanipulators wird ein Regler entworfen, der nicht nur die Referenz der Endeffektor-Bewegung verfolgt, sondern auch priorisierte sekundäre Aufgaben ausführt. Die Wirksamkeit der vorgestellten Lösungen wird durch umfangreiche Tests überprüft, und das vorgestellte Steuerungssystem wird als sehr vielseitig und effektiv bewertet.:1 Introduction 2 Fundamentals 3 System Design and Modeling 4 Reconfigurable Control Allocation 5 Fault Diagnostics For Free Flight 6 Force and Torque Observer 7 Trajectory Generation 8 Hybrid Task Priority Control 9 System Integration and Performance Evaluation 10 Conclusion / In the following decades, aerial manipulators are expected to be deployed in scenarios that are either too dangerous for human beings or too expensive to be accomplished by traditional methods. This thesis presents a novel solution for the overall control of highly redundant aerial manipulation systems. The results are applied to a reference configuration established as a universal platform for performing various aerial manipulation tasks. The platform consists of an omnidirectional multirotor UAV and a serial manipulator. To ensure modular control design, two computationally efficient algorithms are studied to allocate the virtual input to actuator commands. Fault tolerance of the aerial vehicle is achieved by integrating a diagnostic module based on an artificial neural network and the reconfigurable control allocation into the control loop. Besides, the risk of input saturation of individual rotors is minimized by predicting and reconfiguring the speed and acceleration responses. Two filter-based observers are presented to provide the knowledge of external forces and torques, which is necessary to achieve compliant behavior of the end-effector through an axis-selective impedance control in the outer loop. Exploiting the redundancy of the proposed aerial manipulator, the author has designed a control law to achieve the desired end-effector motion and execute secondary tasks in order of priority. The effectiveness of the proposed designs is verified with extensive tests generated by following Monte Carlo method, and the presented control scheme is proved to be versatile and effective.:1 Introduction 2 Fundamentals 3 System Design and Modeling 4 Reconfigurable Control Allocation 5 Fault Diagnostics For Free Flight 6 Force and Torque Observer 7 Trajectory Generation 8 Hybrid Task Priority Control 9 System Integration and Performance Evaluation 10 Conclusion info:eu-repo/classification/ddc/621.3 ddc:621.3
43	Scalable error isolation for distributed systems: modeling, correctness proofs, and additional experiments Behrens, Diogo, Serafini, Marco, Arnautov, Sergei, Junqueira, Flavio, Fetzer, Christof 01 June 2016 (has links) (PDF) This technical report complements the paper entitled “Scalable error isolation for distributed systems” published at USENIX NSDI 15. hardware errors arbitrary state corruption data corruption error isolation distributed systems Byzantine fault tolerance Hardwarefehler Datenkorruption Fehlerisolierung Byzantinischen Fehlertoleranz ddc:004 rvk:SS 5514
44	Responsive Execution of Parallel Programs in Distributed Computing Environments Karl, Holger 03 December 1999 (has links) Vernetzte Standardarbeitsplatzrechner (sog. Cluster) sind eine attraktive Umgebung zur Ausf"uhrung paralleler Programme; f"ur einige Anwendungsgebiete bestehen jedoch noch immer ungel"oste Probleme. Ein solches Problem ist die Verl"asslichkeit und Rechtzeitigkeit der Programmausf"uhrung: In vielen Anwendungen ist es wichtig, sich auf die rechtzeitige Fertigstellung eines Programms verlassen zu k"onnen. Mechanismen zur Kombination dieser Eigenschaften f"ur parallele Programme in verteilten Rechenumgebungen sind das Hauptanliegen dieser Arbeit. Zur Behandlung dieses Anliegens ist eine gemeinsame Metrik f"ur Verl"asslichkeit und Rechtzeitigkeit notwendig. Eine solche Metrik ist die Responsivit"at, die f"ur die Bed"urfnisse dieser Arbeit verfeinert wird. Als Fallstudie werden Calypso und Charlotte, zwei Systeme zur parallelen Programmierung, im Hinblick auf Responsivit"at untersucht und auf mehreren Abstraktionsebenen werden Ansatzpunkte zur Verbesserung ihrer Responsivit"at identifiziert. L"osungen f"ur diese Ansatzpunkte werden zu allgemeineren Mechanismen f"ur (parallele) responsive Dienste erweitert. Im Einzelnen handelt es sich um 1. eine Analyse der Responsivit"at von Calypsos ``eager scheduling'' (ein Verfahren zur Lastbalancierung und Fehlermaskierung), 2. die Behebung eines ``single point of failure,'' zum einen durch eine Responsivit"atsanalyse von Checkpointing, zum anderen durch ein auf Standardschnittstellen basierendes System zur Replikation bestehender Software, 3. ein Verfahren zur garantierten Ressourcenzuteilung f"ur parallele Programme und 4.die Einbeziehung semantischer Information "uber das Kommunikationsmuster eines Programms in dessen Ausf"uhrung zur Verbesserung der Leistungsf"ahigkeit. Die vorgeschlagenen Mechanismen sind kombinierbar und f"ur den Einsatz in Standardsystemen geeignet. Analyse und Experimente zeigen, dass diese Mechanismen die Responsivit"at passender Anwendungen verbessern. / Clusters of standard workstations have been shown to be an attractive environment for parallel computing. However, there remain unsolved problems to make them suitable to some application scenarios. One of these problems is a dependable and timely program execution: There are many applications in which a program should be successfully completed at a predictable point of time. Mechanisms to combine the properties of both dependable and timely execution of parallel programs in distributed computing environments are the main objective of this dissertation. Addressing these properties requires a joint metric for dependability and timeliness. Responsiveness is such a metric; it is refined for the purposes of this work. As a case study, Calypso and Charlotte, two parallel programming systems, are analyzed and their shortcomings on several abstraction levels with regard to responsiveness are identified. Solutions for them are presented and generalized, resulting in widely applicable mechanisms for (parallel) responsive services. Specifically, these solutions are: 1) a responsiveness analysis of Calypso's eager scheduling (a mechanism for load balancing and fault masking), 2) ameliorating a single point of failure by a responsiveness analysis of checkpointing and by a standard interface-based system for replication of legacy software, 3) managing resources in a way suitable for parallel programs, and 4) using semantical information about the communication pattern of a program to improve its performance. All proposed mechanisms can be combined and are suitable for use in standard environments. It is shown by analysis and experiments that these mechanisms improve the responsiveness of eligible applications. paralleles und verteiltes Rechnen Fehlertoleranz Echtzeit Responsivitaet parallel and distributed computing fault tolerance real time responsiveness 004 Informatik 28 Informatik, Datenverarbeitung ddc:004
45	Scalable error isolation for distributed systems: modeling, correctness proofs, and additional experiments Behrens, Diogo, Serafini, Marco, Arnautov, Sergei, Junqueira, Flavio, Fetzer, Christof 01 June 2016 (has links) This technical report complements the paper entitled “Scalable error isolation for distributed systems” published at USENIX NSDI 15. info:eu-repo/classification/ddc/004 ddc:004
46	Zuverlässigkeitsorientierter Entwurf und Analyse von Steuerungssystemen auf Modellebene unter zufälligen Hardwarefehlern Ding, Kai 08 July 2021 (has links) Model-based design is a common methodology in the development of embedded complex control systems. Control system engineers typically prefer to use MATLAB® Simulink® and suitable automatic code generators for the development and deployment of software. Embedded systems are subject to random hardware faults; bit-flips, for example, may affect random access memory (RAM) cells and central processing unit (CPU) registers and cause data errors that may propagate to critical system outputs and result in system failures. From a dependability perspective, the design space of control systems includes the selection of a suitable (reliable) implementation of a control algorithm. Such algorithm can be implemented with model-based software development frameworks, such as Simulink using different, but functionally equivalent implementations. However, these functional equivalents may exhibit completely different reliability properties. This thesis proposes an analytical method for the evaluation of the reliability properties of control systems that are designed with Simulink models. The method is based on a transformation of the assembly code, which is generated from the Simulink model, into a formal stochastic error propagation model as well as its quantification through underlying Markov chain models and state-of-the-art probabilistic model-checking techniques. The application of the method to the functionally equivalent implementations can determine which one is less vulnerable to data errors due to random hardware faults. Fault tolerance is significant to dependable system design. Control systems can be protected with fault tolerance mechanisms to increase the reliability. Redundancy is the key underlying concept for achieving fault tolerance that is usually implemented at the hardware or software level. In the case of model-based development, redundancy mechanisms are preferable for direct application at the model level (Simulink model level). This thesis introduces a systematic classification of fault-tolerant design patterns. Such patterns can be applied to the Simulink model to tolerate random hardware faults, and taken into account during the control system design. In addition, it is more transparent and convenient for control system engineers to directly protect vulnerable parts with fault tolerance mechanisms at the model level. The rigorous reliability assessment of the embedded control systems must be conducted at the assembly level based on the modeling of data errors that occurred in RAM and CPU. However, the scalability of the assembly-level assessment method is challenging and even problematic in view of the state space explosion (SSE) problem of the underlying Markov chain models. The computational complexity may increase exponentially as the assembly code size increases. Moreover, the transformation from the Simulink models to the assembly code is a complicated procedure. It is also more convenient for control engineers to already be able to estimate reliability properties and implement possible reliability improvements at the model level in the early design phase, when the model-based design is actually applied. Therefore, this thesis proposes a model-level reliability evaluation of Simulink models to address the aforementioned problems. The efficiency of the proposed modellevel evaluation is verified by a comparison of the reliability properties that are assessed at the assembly and model levels.:1. Introduction 2. Preliminaries 3. Reliability evaluation of control algorithm implementations at the assembly level 4. Fault-tolerant design patterns 5. MORE: MOdel-based REdundancy for Simulink models 6. Model-level assessment of Simulink models 7. Conclusion info:eu-repo/classification/ddc/621.3 ddc:621.3
47	A Formal Fault Model for Component-Based Models of Embedded Systems Fischer, Marco 14 May 2007 (has links) (PDF) Der vierte Band der wissenschaftlichen Schriftenreihe Eingebettete Selbstorganisierende Systeme widmet sich der Entwicklung von Fehlermodellen für eingebettete, verteilte Multi – Prozessorsysteme. Diese werden zu einem hierarchischen Netzwerk zur Steuerung von Flugzeugen (Avionik) verbunden und mehr und mehr im Automotive Bereich eingesetzt. Hier gilt es höchste Sicherheitsstandards einzuhalten und maximale Verfügbarkeit zu garantieren. Herr Fischer integriert die Modellierung von möglichen Fehlern in den Entwurfsprozess. Auf Grundlage des π-Kalküls entwickelt Herr Fischer ein formales Fehlermodell, das eine einheitliche Modellierung von Fehlerfällen unterstützt. Dabei werden interessante Bezüge zur Bi-Simulation sowie zu Methoden des Modell Checkings hergestellt. Die theoretischen Ergebnisse werden an einem komplexen Beispiel anschaulich illustriert. So kann der Leser die Mächtigkeit des entwickelten Ansatzes nachvollziehen und wird motiviert, die entwickelte Methodik auf weitere Anwendungsfälle zu übertragen. / The 4th volume of the scientific series Eingebettete, selbstorganisierende Systeme (Embedded Self-Organized Systems) outlines the design of fault models for embedded distributed multi processor systems. These multi processor systems will be connected to a hierarchical network to control airplanes (avionics) and also be used more and more in the automotive area. Here it is essential to meet highest safety standards and to ensure the maximum of availability. Mr Fischer integrates the modelling of potential faults into the design process. Based on the pi-calculus, he develops a formal framework, which supports a standardised modelling of faults. Thereby, interesting connections to the Bi-Simulation as well as to methods of the Model checking are established. The theoretical results are depicted on a complex example. So it is possible for the reader to understand the complexity of this approach and is motivated to use the developed methodology in other applications. I am glad that Mr Fischer publishes his important research in this scientific series. Systemebene computer engineering computer science embedded systems fault tolerance formal fault model formales Fehlermodell pi-calculus process algebra system level system model ddc:004 Eingebettetes System Fehlertoleranz Informatik Pi-Kalkül Prozessalgebra Systemmodell Technische Informatik
48	Automatic Hardening against Dependability and Security Software Bugs / Automatisches Härten gegen Zuverlässigkeits- und Sicherheitssoftwarefehler Süßkraut, Martin 15 June 2010 (has links) (PDF) It is a fact that software has bugs. These bugs can lead to failures. Especially dependability and security failures are a great threat to software users. This thesis introduces four novel approaches that can be used to automatically harden software at the user's site. Automatic hardening removes bugs from already deployed software. All four approaches are automated, i.e., they require little support from the end-user. However, some support from the software developer is needed for two of these approaches. The presented approaches can be grouped into error toleration and bug removal. The two error toleration approaches are focused primarily on fast detection of security errors. When an error is detected it can be tolerated with well-known existing approaches. The other two approaches are bug removal approaches. They remove dependability bugs from already deployed software. We tested all approaches with existing benchmarks and applications, like the Apache web-server. Zuverlässigkeit Robustheit Sicherheit Bugs Softwarebugs Fehlererkennung Fehlerdetektion Fehlertoleranz Parallelisierung SwitchBlade ParExC AutoPatch AutoCannon Dependability robustness security bugs software bugs fault detection fault tolerance parallelization runtime checker SwitchBlade ParExC AutoPatch AutoCannon ddc:004 rvk:ST 230 rvk:ST 277
49	Minimizing Overhead for Fault Tolerance in Event Stream Processing Systems Martin, André 20 September 2016 (has links) (PDF) Event Stream Processing (ESP) is a well-established approach for low-latency data processing enabling users to quickly react to relevant situations in soft real-time. In order to cope with the sheer amount of data being generated each day and to cope with fluctuating workloads originating from data sources such as Twitter and Facebook, such systems must be highly scalable and elastic. Hence, ESP systems are typically long running applications deployed on several hundreds of nodes in either dedicated data-centers or cloud environments such as Amazon EC2. In such environments, nodes are likely to fail due to software aging, process or hardware errors whereas the unbounded stream of data asks for continuous processing. In order to cope with node failures, several fault tolerance approaches have been proposed in literature. Active replication and rollback recovery-based on checkpointing and in-memory logging (upstream backup) are two commonly used approaches in order to cope with such failures in the context of ESP systems. However, these approaches suffer either from a high resource footprint, low throughput or unresponsiveness due to long recovery times. Moreover, in order to recover applications in a precise manner using exactly once semantics, the use of deterministic execution is required which adds another layer of complexity and overhead. The goal of this thesis is to lower the overhead for fault tolerance in ESP systems. We first present StreamMine3G, our ESP system we built entirely from scratch in order to study and evaluate novel approaches for fault tolerance and elasticity. We then present an approach to reduce the overhead of deterministic execution by using a weak, epoch-based rather than strict ordering scheme for commutative and tumbling windowed operators that allows applications to recover precisely using active or passive replication. Since most applications are running in cloud environments nowadays, we furthermore propose an approach to increase the system availability by efficiently utilizing spare but paid resources for fault tolerance. Finally, in order to free users from the burden of choosing the correct fault tolerance scheme for their applications that guarantees the desired recovery time while still saving resources, we present a controller-based approach that adapts fault tolerance at runtime. We furthermore showcase the applicability of our StreamMine3G approach using real world applications and examples. Complex Event Processing CEP Event Stream Processing ESP Skalierung Migration Zustandsmanagement Fehlertoleranz Complex Event Processing CEP Event Stream Processing ESP Scalability Migration State Management Fault Tolerance ddc:004 rvk:ST 257 rvk:ST 200 rvk:ST 234 rvk:ST 233
50	Comprehensive Backend Support for Local Memory Fault Tolerance Rink, Norman Alexander, Castrillon, Jeronimo 19 December 2016 (has links) (PDF) Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact. vorübergehende Hardware-Fehler Speicherfehler Fehlererkennung Fehlertoleranz Compiler-Backend Code-Generierung Zwischendarstellung (IR) LLVM transient hardware faults soft errors memory faults error detection fault tolerance resilience compiler backend code generation intermediate representation (IR) LLVM ddc:004 rvk:SS 5514

Search results