11 |
Pipelined Byzantine Fault Tolerance and ApplicationsAdithya Bhat (17583018) 07 December 2023 (has links)
<p dir="ltr">Practically, Byzantine faults are not assumed in cloud applications. Byzantine fault-tolerance adds significant cryptographic, communication, throughput, and latency overheads to applications, contributing to the resistance towards its widespread adoption. Existing Byzantine-fault tolerant protocols focus on optimal latency or optimal communication while ignoring the throughput and cryptographic overheads.</p><p dir="ltr">In this thesis, we explore pipelining for Byzantine fault-tolerant applications. Pipelining tasks is a common optimization in distributed systems that involves executing tasks in stages. The idea is that instead of executing a task in an iteration as an atomic unit, we split the execution into stages and execute all stages of <i>different</i> tasks per iteration. We observe significant performance benefits if executing later stages of a task helps other tasks in earlier stages, saving effort in each stage. The length of the pipeline, i.e., the number of stages, determines the latency of an individual task. However, if the pipeline improves the execution of every stage enough, then the latency improves.</p><p dir="ltr">We primarily explore three Byzantine Fault Tolerant (BFT) applications with pipelining: (i) unique chain-based State Machine Replication protocols: <i>Apollo</i>, <i>Artemis</i>, <i>Leto</i>, and <i>Zeus</i>, and (ii) energy-efficient State Machine Replication: <i>EESMR</i>. (iii) random beacon protocols: <i>GRandPiper</i>, <i>BRandPiper</i>, and <i>OptRand</i>. We design them with a pipeline-first approach to improve the throughput, cryptographic, and communication costs at every stage of the pipeline. With respect to latency, we show (i) pipelined SMR protocols where our pipeline stages have constant cryptographic and linear communication costs allowing our protocols to outperform state-of-the-art BFT-SMR protocols in throughput. (ii) pipelined SMR protocols with techniques to make each stage of the pipeline independent, thus achieving demonstrable energy efficiency while allowing an unbounded number of non-interactive parallel proposals. (iii) reduced latencies for reconfiguration-friendly random beacons by using two pipelines: an SMR pipeline to commit and a beacon pipeline to produce random numbers and decoupling the two pipelines thereby removing the impact of the high-latency SMR pipeline on the latency of the randomness output by the system. </p>
|
12 |
Generalized Consensus for Practical Fault-ToleranceGarg, Mohit 07 September 2018 (has links)
Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols. / Master of Science / Online services like Facebook, Twitter, Netflix and Spotify to cloud services like Google and Amazon serve millions of users which include individuals as well as organizations. They use many distributed technologies to deliver a rich experience. The distributed nature of these technologies has removed geographical barriers to accessing data, services, software, and hardware. An essential aspect of these technologies is the concept of the shared state. Distributed databases with multiple replicated data nodes are an example of this shared state. Maintaining replicated data nodes provides several advantages such as (1) availability so that in case one node goes down the data can still be accessed from other nodes, (2) quick response times, by placing data nodes closer to the user, the data can be obtained quickly, (3) scalability by enabling multiple users to access different nodes so that a single node does not cause bottlenecks. To maintain this shared state some mechanism is required to maintain consistency, that is the copies of these shared state must be identical on all the data nodes. This mechanism is called Consensus, and several such mechanisms exist in practice today which use the Crash Fault Tolerance (CFT). The CFT model implies that these mechanisms provide consistency in the presence of nodes crashing. While the state-of-the-art for security has moved from assuming a trusted environment inside a firewall to a perimeter-less and semi-trusted environment with every service living on the internet, only the application layer is required to be secured while the core is built just with an idea of crashes in mind. While there exists comprehensive research on secure Consensus mechanisms which utilize what is called the Byzantine Fault Tolerance (BFT) model, the extra costs required to implement these mechanisms and comparatively lower performance in a geographically distributed setting has impeded widespread adoption. A new model recently proposed tries to find a cross between these models that is achieving security while paying no extra costs called the Cross Fault Tolerance (XFT). This thesis presents Elpis, a consensus mechanism which uses precisely this model that will secure the shared state from its core without modifications to the existing setups while delivering high performance and lower response times. We perform a comprehensive evaluation on AWS and demonstrate that Elpis achieves 3.5x over the state-of-the-art while improving response times by as much as 50%.
|
13 |
Error isolation in distributed systemsBehrens, Diogo 25 May 2016 (has links) (PDF)
In distributed systems, if a hardware fault corrupts the state of a process, this error might propagate as a corrupt message and contaminate other processes in the system, causing severe outages. Recently, state corruptions of this nature have been observed surprisingly often in large computer populations, e.g., in large-scale data centers. Moreover, since the resilience of processors is expected to decline in the near future, the likelihood of state corruptions will increase even further.
In this work, we argue that preventing the propagation of state corruption should be a first-class requirement for large-scale fault-tolerant distributed systems. In particular, we propose developers to target error isolation, the property in which each correct process ignores any corrupt message it receives.
Typically, a process cannot decide whether a received message is corrupt or not. Therefore, we introduce hardening as a class of principled approaches to implement error isolation in distributed systems. Hardening techniques are (semi-)automatic transformations that enforce that each process appends an evidence of good behavior in the form of error codes to all messages it sends. The techniques “virtualize” state corruptions into more benign failures such as crashes and message omissions: if a faulty process fails to detect its state corruption and abort, then hardening guarantees that any corrupt message the process sends has invalid error codes. Correct processes can then inspect received messages and drop them in case they are corrupt.
With this dissertation, we contribute theoretically and practically to the state of the art in fault-tolerant distributed systems. To show that hardening is possible, we design, formalize, and prove correct different hardening techniques that enable existing crash-tolerant designs to handle state corruption with minimal developer intervention. To show that hardening is practical, we implement and evaluate these techniques, analyzing their effect on the system performance and their ability to detect state corruptions in practice.
|
14 |
Byzantine Fault Tolerance for Nondeterministic ApplicationsChen, Bo January 2008 (has links)
No description available.
|
15 |
Byzantine Fault Tolerant Collaborative EditingBABI, MAMDOUH O. 24 May 2017 (has links)
No description available.
|
16 |
Scalable error isolation for distributed systems: modeling, correctness proofs, and additional experimentsBehrens, Diogo, Serafini, Marco, Arnautov, Sergei, Junqueira, Flavio, Fetzer, Christof 01 June 2016 (has links) (PDF)
This technical report complements the paper entitled “Scalable error isolation for distributed systems” published at USENIX NSDI 15.
|
17 |
Efficient Proactive Security for Sensitive Data StorageSubbiah, Arun 24 August 2007 (has links)
Fault tolerant and secure distributed data storage systems typically require that only up to a threshold of storage nodes can ever be compromised or fail. In proactively-secure systems, this
requirement is modified to hold only in a time interval (also called epoch), resulting in increased security. An attacker or adversary
could compromise distinct sets of nodes in any two time intervals. This attack model is also called the mobile adversary model. Proactively-secure systems require all nodes to "refresh" themselves periodically to a clean state to maintain the availability, integrity, and confidentiality properties of the data storage service.
This dissertation investigates the design of a proactively-secure distributed data storage system. Data can be stored at storage servers using encoding schemes called secret sharing, or encryption-with-replication. The primary challenge is that the protocols that the servers run periodically to maintain integrity and confidentiality must scale with large amounts of stored data. Determining how much data can be proactively-secured in practical settings is an important objective of this dissertation.
The protocol for maintain the confidentiality of stored data is developed in the context of data storage using secret sharing. We propose a new technique called the GridSharing framework that uses a combination of XOR secret sharing and replication for storing data efficiently. We experimentally show that the algorithm can secure several hundred GBs of data.
We give distributed protocols run periodically by the servers for maintaining the integrity of replicated data under the mobile adversary model.
This protocol is integrated into a document repository to make it proactively-secure. The proactively-secure document repository is implemented and evaluated on the Emulab cluster (http://www.emulab.net). The experimental evaluation shows that several 100 GBs of data
can be proactively-secured.
This dissertation also includes work on fault and intrusion detection - a necessary component in any secure system. We give a novel Byzantine-fault detection algorithm for quorum systems, and experimentally evaluate its performance using simulations and by deploying it in the AgileFS distributed file system.
|
18 |
Vers des protocoles de tolérance aux fautes byzantines efficaces et robustes / Towards efficient and robust byzantine fault tolerance protocolsPerronne, Lucas 08 December 2016 (has links)
Au cours de la dernière décennie, l'informatique en nuage (Cloud Computing) suscita un important changement de paradigme dans de nombreux systèmes d'information. Ce nouveau paradigme s'illustre principalement par la délocalisation de l'infrastructure informatique hors du parc des entreprises, permettant ainsi une utilisation des ressources à la demande. La prise en charge de serveurs locaux s'est donc vue peu à peu remplacée par la location de serveurs distants, auprès de fournisseurs spécialisés tels que Google, Amazon, Microsoft. Afin d'assurer la pérennité d'un tel modèle économique, il apparaît nécessaire de fournir aux utilisateurs diverses garanties relatives à la sécurité, la disponibilité, ou encore la fiabilité des ressources mises à disposition. Ces facteurs de qualité de service (QoS pour Quality of Service) permettent aux fournisseurs et aux utilisateurs de s'accorder sur le niveau de prestation escompté. En pratique, les serveurs mis à disposition des utilisateurs doivent épisodiquement faire face à des fautes arbitraires (ou byzantines). Il s'agit par exemple de ruptures temporaires du réseau, du traitement de messages corrompus, ou encore d’arrêts inopinés. Le contexte d'informatique en nuage s'est vu néanmoins propice à l'émergence de technologies telles que la virtualisation ou la réplication de machines à états. De telles technologies permettent de pallier efficacement à l’occurrence de pannes via l'implémentation de protocoles de tolérance aux pannes.La tolérance aux fautes byzantines (BFT pour Byzantine Fault Tolerance) est un domaine de recherche implémentant les concepts de réplication de machines à états, qui vise à assurer la continuité et la fiabilité des services en présence de comportements arbitraires. Afin de répondre à cette problématique, de nombreux protocoles furent proposés. Ceux-ci se doivent d'être efficaces afin de masquer le surcoût lié à la réplication, mais également robustes afin de maintenir un niveau de performance élevé en présence de fautes. Nous constatons d'abord qu'il est délicat de relever ces deux défis à la fois: les protocoles actuels sont soit conçus pour être efficaces au détriment de leur robustesse, soit pour être robustes au détriment de leur efficacité. Cette thèse se focalise autour de cette problématique, l'objectif étant de fournir les instruments nécessaires à la conception de protocoles à la fois robustes et efficaces.Notre intérêt se porte principalement vers deux types de dénis de service liés à la gestion des requêtes. Le premier de ces dénis de service est causé par la corruption partielle d'une requête lors de son émission par un client. Le deuxième est causé par l'abandon intentionnel d'une requête lors de sa réception par un réplica. Afin de faire face efficacement à ces deux comportements byzantins, plusieurs mécanismes dédiés furent implémentés dans les protocoles de BFT robustes. En pratique, ces mécanismes engendrent d'importants surcoûts, ce qui nous permet d'introduire notre première contribution: la définition de plusieurs principes de conception génériques destinés à réduire ces surcoûts tout en assurant un niveau de robustesse équivalent.La seconde contribution de cette thèse illustre ER-PBFT, un nouveau protocole implémentant ces principes de conception sur PBFT, la référence en matière de tolérance aux fautes byzantines. Nous démontrons l'efficacité de notre nouvelle politique de robustesse, à la fois en présence de comportements byzantins mais également lors de scénarios sans faute.La troisième contribution illustre ER-COP, un nouveau protocole orienté à la fois vers l’efficacité et la robustesse, implémentant nos principes de conception sur COP, le protocole de BFT fournissant les meilleures performances à l'heure actuelle dans un environnement sans faute. Nous évaluons le surcoût engendré par l'intégration de notre politique de robustesse, et nous démontrons la capacité de ER-COP à tolérer l'occurrence de comportements byzantins. / Over the last decade, Cloud computing instigated an important switch of paradigm in numerous information systems. This new paradigm is mainly illustrated by the re-location of the whole IT infrastructures out of companies’ warehouses. The use of local servers has thus being replaced by remote ones, rented from dedicated providers such as Google, Amazon, Microsoft.In order to ensure the sustainability of this economic model, it appears necessary to provide several guarantees to users, related to the security, availability, or even reliability of the proposed resources. Such quality of service (QoS) factors allow providers and users to reach an agreement on the expected level of dependability. Practically, the proposed servers must episodically cope with arbitrary faults (also called byzantine faults), such as incorrect/corrupted messages, servers crashes, or even network failures. Nevertheless, the Cloud computing environment encouraged the emergence of technologies such as virtualization or state machine replication. These technologies allow cloud providers to efficiently face the occurrences of faults through the implementation of fault tolerance protocols.Byzantine Fault Tolerance (BFT) is a research area involving state machine replication concepts, and aiming at ensuring continuity and reliability of hosted services in presence of any kind of arbitrary behaviors. In order to handle such threat, numerous protocols were proposed. These protocols must be efficient in order to counterbalance the extra cost of replication, and robust in order to lower the impact of byzantine behaviors on the system performance. We first noticed that tackling both these concerns at the same time is difficult: current protocols are either designed to be efficient at the expense of their robustness, or robust at the expense of their efficiency. We tackle this specific problem in this thesis, our goal being to provide the required tools to design both efficient and robust BFT protocols.Our focus is mainly dedicated to two types of denial-of-service attacks involving requests management. The first one is caused by the partial corruption of a request transmitted by a client. The second one is caused by the intentional drop of a request upon receipt. In order to face efficiently both these byzantine behaviors, several mechanisms were integrated in robust BFT protocols. In practice, these mecanisms involve high overheads, and thus lead to the significant performance drop of robust protocols compared to efficien ones. This assessment allows us to introduce our first contribution: the definition of several generic design principles, applicable to numerous existing BFT protocols, and aiming at reducing these overheads while maintaining the same level of robustness.The second contribution introduces ER-PBFT, a new protocol implementing these design principles on PBFT, the reference in terms of byzantine fault tolerance. We demonstrate the efficiency of our new robustness policy, both in fault-free scenarios and in presence of byzantine behaviors.The third contribution highlights ER-COP, a new BFT protocol dedicated to both efficiency and robustness, implementing our design principles on COP, the BFT protocol providing for now the best performances in a fault-free environment. We evaluate the additional cost introduced by our robustness policy, and we demonstrate ER-COP's ability to handle byzantine behaviors.
|
19 |
Error isolation in distributed systemsBehrens, Diogo 14 January 2016 (has links)
In distributed systems, if a hardware fault corrupts the state of a process, this error might propagate as a corrupt message and contaminate other processes in the system, causing severe outages. Recently, state corruptions of this nature have been observed surprisingly often in large computer populations, e.g., in large-scale data centers. Moreover, since the resilience of processors is expected to decline in the near future, the likelihood of state corruptions will increase even further.
In this work, we argue that preventing the propagation of state corruption should be a first-class requirement for large-scale fault-tolerant distributed systems. In particular, we propose developers to target error isolation, the property in which each correct process ignores any corrupt message it receives.
Typically, a process cannot decide whether a received message is corrupt or not. Therefore, we introduce hardening as a class of principled approaches to implement error isolation in distributed systems. Hardening techniques are (semi-)automatic transformations that enforce that each process appends an evidence of good behavior in the form of error codes to all messages it sends. The techniques “virtualize” state corruptions into more benign failures such as crashes and message omissions: if a faulty process fails to detect its state corruption and abort, then hardening guarantees that any corrupt message the process sends has invalid error codes. Correct processes can then inspect received messages and drop them in case they are corrupt.
With this dissertation, we contribute theoretically and practically to the state of the art in fault-tolerant distributed systems. To show that hardening is possible, we design, formalize, and prove correct different hardening techniques that enable existing crash-tolerant designs to handle state corruption with minimal developer intervention. To show that hardening is practical, we implement and evaluate these techniques, analyzing their effect on the system performance and their ability to detect state corruptions in practice.
|
20 |
Scalable error isolation for distributed systems: modeling, correctness proofs, and additional experimentsBehrens, Diogo, Serafini, Marco, Arnautov, Sergei, Junqueira, Flavio, Fetzer, Christof 01 June 2016 (has links)
This technical report complements the paper entitled “Scalable error isolation for distributed systems” published at USENIX NSDI 15.
|
Page generated in 0.0351 seconds