Global ETD Search

1	APPLICATION AWARE FOR BYZANTINE FAULT TOLERANCE Chai, Hua 09 December 2014 (has links) No description available. Computer Engineering Byzantine fault tolerance
2	Scalable Byzantine State Machine Replication: Designs, Techniques, and Implementations Arun, Balaji 02 July 2021 (has links) State machine replication (SMR) is one of the most widely studied and used methodology for building highly available distributed applications and services. SMR replicates a service across a set of computing hosts, and executes client operations on the replicas in an agreed- upon total order, ensuring linearizability of the replicated shared state. The problem of determining a total order reduces to one of computing consensus. State-of-the-art consensus protocols are inadequate for newer classes of applications such as Blockchains and for geographically distributed infrastructures. The widely used Crash Fault Tolerance (CFT) fault model of consensus protocols is prone to malicious and adversarial behaviors as well as non-crash faults such as software bugs. The Byzantine fault-tolerance (BFT) model and its trust-based variant, the hybrid model, permit stronger failure adversaries. However, state-of-the-art Byzantine and hybrid consensus protocols have performance limitations in geographically distributed environments: they designate a primary replica for proposing total-orders, which becomes a bottleneck and yields sub-optimal latencies for faraway clients. Additionally, they do not scale to hundreds of replicas and provide consistent performance as the system size grows. To overcome these limitations and develop highly scalable SMR solutions, this dissertation presents two leaderless consensus protocols, namely ezBFT and Dester, for the Byzantine and hybrid models, respectively. These protocols enable every replica to receive and order client commands. Additionally, they exchange command dependencies to collectively order commands without relying on a primary. Our experimental evaluations in a 7-node geographically distributed setup reveals that ezBFT improves client-side latency by as much as 40% over state-of-the-art BFT protocols including PBFT, FaB, and Zyzzyva. Dester, for the hybrid model, reduces latency by as much as 30% over ezBFT. Next, the dissertation presents a new paradigm called DQBFT for designing consensus protocols that can scale to hundreds of nodes in geographically distributed environments. Since leaderless protocols exchange command dependencies, they do not scale to hundreds of nodes. DQBFT overcomes this scalability limitation by decentralizing only the heavy task of replicating commands and centralizing the process of ordering the commands. While DQBFT can be used to enhance existing primary-based protocols, Destiny is a hybrid instantiation of the DQBFT paradigm using linear communication for better scalability than naive instantiations. Experimental evaluations in a 193-node geographically distributed setup reveal that Destiny achieves ≈ 3× better throughput and ≈50% better latency than state-of-the-art BFT protocols including Hotstuff, SBFT, and Hybster. Lastly, the dissertation presents two techniques for designing and implementing BFT protocols with reduced development costs. The dissertation presents Bumblebee, a methodology for manually transforming CFT protocols to tolerate Byzantine faults using trusted execution environments that are increasingly available in commodity hardware. Bumblebee is based on the observation that CFT protocols are incapable of tolerating non-malicous non-crash faults, but they are nevertheless deployed in many production systems. Bumblebee provides a Generic Algorithm that can represent protocols in both CFT and hybrid fault models, thus allowing easy construction of hybrid protocols using CFT protocols as baselines. The dissertation constructs hybrid instantiations of CFT protocols including Paxos, Raft, and M2Paxos. Experimental evaluations of the hybrid variants reveal that they perform at par with native hybrid protocols, but incur a 30% overhead over their CFT counterparts. Hybrid protocols rely on the integrity of trusted execution environments, which are increasingly subject to security exploits. To withstand exploits, the dissertation presents DuoBFT, a protocol that exposes both the BFT and hybrid fault models within a single consensus protocol. This enables consensus under both fault models within the same protocol and without additional redundancy, allowing DuoBFT to achieve the performance of hybrid protocols and the security of BFT protocols. Experimental evaluations reveal that DuoBFT achieves the best of both hybrid and BFT fault models with less than 10% overhead. / Doctor of Philosophy / Computers are ubiquitous; they perform some of the most complex and safety-critical tasks such as controlling aircraft, managing the financial markets, and maintaining sensitive medical records. The undeniable fact is that computers are faulty. They are prone to crash and can behave arbitrarily. Even the most robust computers such as those that are sent to the outer space eventually fail. External phenomenon such as power outages and network disruptions affect their operation. To make computing systems reliable, researchers and practitioners have long focussed on interconnecting many individual computers and programming them to effectively be duplicates of one another. This way when one computer fails in a system, the rest of the computers still ensure that the system as a whole is operational. Duplication requires that multiple computers effectively perform the same task. In order for multiple computers to perform the same task together, they should first agree on the task. More generally, since computing systems perform multiple tasks, they should agree on the sequence of tasks that they will individually perform and follow the agreement. This is what is known as the State Machine Replication technique. State Machine Replication (SMR) is a powerful technique that is applicable to numerous computing applications. Blockchain systems, the technology behind the cryptocurrencies such as Bitcoin and Ethereum, uses the SMR technique. In the context of Blockchain, the added challenge in that some of the computers involved in SMR can be programmed by adversarial parties and could act in a way to jeopardize the integrity of the whole system. For Bitcoin and Ethereum, this could mean embezzlement of hundreds or even millions of dollars worth of cryptocurrencies. Certain SMR systems are capable of tolerating such intrusions and ensure system integrity. Such systems are deemed to be Byzantine tolerant. This dissertation presents designs, techniques, and implementations of Byzantine State Machine Replication systems. The problems addressed in this dissertation are those that plague existing Byzantine SMR systems making them suboptimal for newer applications such as Blockchains. First, when computers that participate in SMR are spread around the world, their performance is dependent on the communication latencies between any two pair of computers. Second, the number of computers required is proportional the number of adversarial computers that need to be tolerated. Consequently, certain SMR systems for Blockchains require hundreds of computers to tolerate heavy adversarial behavior. Many existing SMR technique perform poorly under these scenarios. The techniques presented in this dissertation address various permutations of these challenges. Byzantine Fault Tolerance State Machine Replication
3	Secure Store : A Secure Distributed Storage Service Lakshmanan, Subramanian 12 August 2004 (has links) As computers become pervasive in environments that include the home and community, new applications are emerging that will create and manipulate sensitive and private information. These applications span systems ranging from personal to mobile and hand held devices. They would benefit from a data storage service that protects the integrity and confidentiality of the stored data and is highly available. Such a data repository would have to meet the needs of a variety of applications, handling data with varying security and performance requirements. Providing simultaneously both high levels of security and high levels of performance may not be possible when many nodes in the system are under attack. The agility approach to building secure distributed services advocates the principle that the overhead of providing strong security guarantees should be incurred only by those applications that require such high levels of security and only at times when it is necessary to defend against high threat levels. A storage service that is designed for a variety of applications must follow the principles of agility, offering applications a range of options to choose from for their security and performance requirements. This research presents secure store, a secure and highly available distributed store to meet the performance and security needs of a variety of applications. Secure store is designed to guarantee integrity, confidentiality and availability of stored data even in the face of limited number of compromised servers. Secure store is designed based on the principles of agility. Secure store integrates two well known techniques, namely replication and secret-sharing, and exploits the tradeoffs that exist between security and performance to offer applications a range of options to choose from to suit their needs. This thesis makes several contributions, including (1) illustration of the the principles of agility, (2) a novel gossip-style secure dissemination protocol whose performance is comparable to the best-possible benign-case protocol in the absence of any malicious activity, (3) demonstration of the performance benefits of using weaker consistency models for data access, and (4) a technique called collective endorsement that can be used in other secure distributed applications. Consistency Secure dissemination Byzantine fault tolerance Storage security Distributed storage
4	Highly available storage with minimal trust Mahajan, Prince 05 July 2012 (has links) Storage services form the core of modern Internet-based services spanning commercial, entertainment, and social-networking sectors. High availability is crucial for these services as even an hour of unavailability can cost them millions of dollars in lost revenue. Unfortunately, it is difficult to build highly available storage services that provide useful correctness properties. Both benign (system crashes, power out- ages etc.) and Byzantine faults (memory or disk corruption, software or configuration errors etc.) plague the availability of these services. Furthermore, the goal of high availability conflicts with our desire to provide good performance and strong correctness guarantees. For example, the Consistency, Availability, and Partition- resilience (CAP) theorem states that a storage service that must be available despite network partitions cannot enforce strong consistency. Similarly, the tradeoff between latency and durability dictates that a low-latency service cannot ensure durability in the presence of data-center wide failures. This dissertation explores the theoretical and practical limits of storage services that can be safe and live despite the presence of benign and Byzantine faults. On the practical front, we use cloud storage as a deployment model to build Depot, a highly available storage service that addresses the above challenges. Depot minimizes the trust clients have to put in the third party storage provider. As a result, Depot clients can continue functioning despite benign or Byzantine faults of the cloud servers. Yet, Depot provides stronger availability, durability, and consistency properties than those provided by many of the existing cloud deployments, without incurring prohibitive performance cost. For example, in contrast to Amazon S3’s eventual consistency, Depot provides a variation of causal consistency on each volume, while tolerating Byzantine faults. On the theoretical front, we explore the consistency-availability tradeoffs. Tradeoffs between consistency and availability have proved useful for designers in deciding how much to strengthen consistency if high availability is desired or how much to compromise availability if strong consistency is essential. We explore the limits of such tradeoffs by attempting to answer the question: What are the semantics that can be implemented without compromising availability? In this work, we investigate this question for both fail-stop and Byzantine failure models. An immediate benefit of answering this question is that we can compare and contrast the consistency provided by Depot with that achievable by an optimal implementation. More crucially, this result complements the CAP theorem. While, the CAP theorem defines a set of properties that cannot be achieved, this work identifies the limits of properties that can be achieved. / text Cloud storage Byzantine fault tolerance Fork-Join-Causal (FJC) consistency Fork consistency
5	UpRight fault tolerance Clement, Allen Grogan 13 November 2012 (has links) Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications. / text Fault tolerance Dependability Byzantine fault tolerance UpRight fault tolerance Replicated state machine
6	Securing multi-robot systems with inter-robot observations and accusations Wardega, Kacper Tomasz 24 May 2023 (has links) In various industries, such as manufacturing, logistics, agriculture, defense, search and rescue, and transportation, Multi-robot systems (MRSs) are increasingly gaining popularity. These systems involve multiple robots working together towards a shared objective, either autonomously or under human supervision. However, as MRSs operate in uncertain or even adversarial environments, and the sensors and actuators of each robot may be error-prone, they are susceptible to faults and security threats unique to MRSs. Classical techniques from distributed systems cannot detect or mitigate these threats. In this dissertation, novel techniques are proposed to enhance the security and fault-tolerance of MRSs through inter-robot observations and accusations. A fundamental security property is proposed for MRSs, which ensures that forbidden deviations from a desired multi-robot motion plan by the system supervisor are detected. Relying solely on self-reported motion information from the robots for monitoring deviations can leave the system vulnerable to attacks from a single compromised robot. The concept of co-observations is introduced, which are additional data reported to the supervisor to supplement the self-reported motion information. Co-observation-based detection is formalized as a method of identifying deviations from the expected motion plan based on discrepancies in the sequence of co-observations reported. An optimal deviation-detecting motion planning problem is formulated that achieves all the original application objectives while ensuring that all forbidden plan-deviation attacks trigger co-observation-based detection by the supervisor. A secure motion planner based on constraint solving is proposed as a proof-of-concept to implement the deviation-detecting security property. The security and resilience of MRSs against plan deviation attacks are further improved by limiting the information available to attackers. An efficient algorithm is proposed that verifies the inability of an attacker to stealthily perform forbidden plan deviation attacks with a given motion plan and announcement scheme. Such announcement schemes are referred to as horizon-limiting. An optimal horizon-limiting planning problem is formulated that maximizes planning lookahead while maintaining the announcement scheme as horizon-limiting. Co-observations and horizon-limiting announcements are shown to be efficient and scalable in protecting MRSs, including systems with hundreds of robots, as evidenced by a case study in a warehouse setting. Lastly, the Decentralized Blocklist Protocol (DBP), a method for designing Byzantine-resilient decentralized MRSs, is introduced. DBP is based on inter-robot accusations and allows cooperative robots to identify misbehavior through co-observations and share this information through the network. The method is adaptive to the number of faulty robots and is widely applicable to various decentralized MRS applications. It also permits fast information propagation, requires fewer cooperative observers of application-specific variables, and reduces the worst-case connectivity requirement, making it more scalable than existing methods. Empirical results demonstrate the scalability and effectiveness of DBP in cooperative target tracking, time synchronization, and localization case studies with hundreds of robots. The techniques proposed in this dissertation enhance the security and fault-tolerance of MRSs operating in uncertain and adversarial environments, aiding in the development of secure MRSs for emerging applications. Computer engineering Byzantine fault tolerance Multi-agent systems Multi-robot systems
7	Pipelined Byzantine Fault Tolerance and Applications Adithya Bhat (17583018) 07 December 2023 (has links) <p dir="ltr">Practically, Byzantine faults are not assumed in cloud applications. Byzantine fault-tolerance adds significant cryptographic, communication, throughput, and latency overheads to applications, contributing to the resistance towards its widespread adoption. Existing Byzantine-fault tolerant protocols focus on optimal latency or optimal communication while ignoring the throughput and cryptographic overheads.</p><p dir="ltr">In this thesis, we explore pipelining for Byzantine fault-tolerant applications. Pipelining tasks is a common optimization in distributed systems that involves executing tasks in stages. The idea is that instead of executing a task in an iteration as an atomic unit, we split the execution into stages and execute all stages of <i>different</i> tasks per iteration. We observe significant performance benefits if executing later stages of a task helps other tasks in earlier stages, saving effort in each stage. The length of the pipeline, i.e., the number of stages, determines the latency of an individual task. However, if the pipeline improves the execution of every stage enough, then the latency improves.</p><p dir="ltr">We primarily explore three Byzantine Fault Tolerant (BFT) applications with pipelining: (i) unique chain-based State Machine Replication protocols: <i>Apollo</i>, <i>Artemis</i>, <i>Leto</i>, and <i>Zeus</i>, and (ii) energy-efficient State Machine Replication: <i>EESMR</i>. (iii) random beacon protocols: <i>GRandPiper</i>, <i>BRandPiper</i>, and <i>OptRand</i>. We design them with a pipeline-first approach to improve the throughput, cryptographic, and communication costs at every stage of the pipeline. With respect to latency, we show (i) pipelined SMR protocols where our pipeline stages have constant cryptographic and linear communication costs allowing our protocols to outperform state-of-the-art BFT-SMR protocols in throughput. (ii) pipelined SMR protocols with techniques to make each stage of the pipeline independent, thus achieving demonstrable energy efficiency while allowing an unbounded number of non-interactive parallel proposals. (iii) reduced latencies for reconfiguration-friendly random beacons by using two pipelines: an SMR pipeline to commit and a beacon pipeline to produce random numbers and decoupling the two pipelines thereby removing the impact of the high-latency SMR pipeline on the latency of the randomness output by the system. </p> Cryptography Distributed systems and algorithms Performance evaluation State Machine Replication Random Beacon Byzantine fault tolerance (BFT)
8	Generalized Consensus for Practical Fault-Tolerance Garg, Mohit 07 September 2018 (has links) Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols. / Master of Science / Online services like Facebook, Twitter, Netflix and Spotify to cloud services like Google and Amazon serve millions of users which include individuals as well as organizations. They use many distributed technologies to deliver a rich experience. The distributed nature of these technologies has removed geographical barriers to accessing data, services, software, and hardware. An essential aspect of these technologies is the concept of the shared state. Distributed databases with multiple replicated data nodes are an example of this shared state. Maintaining replicated data nodes provides several advantages such as (1) availability so that in case one node goes down the data can still be accessed from other nodes, (2) quick response times, by placing data nodes closer to the user, the data can be obtained quickly, (3) scalability by enabling multiple users to access different nodes so that a single node does not cause bottlenecks. To maintain this shared state some mechanism is required to maintain consistency, that is the copies of these shared state must be identical on all the data nodes. This mechanism is called Consensus, and several such mechanisms exist in practice today which use the Crash Fault Tolerance (CFT). The CFT model implies that these mechanisms provide consistency in the presence of nodes crashing. While the state-of-the-art for security has moved from assuming a trusted environment inside a firewall to a perimeter-less and semi-trusted environment with every service living on the internet, only the application layer is required to be secured while the core is built just with an idea of crashes in mind. While there exists comprehensive research on secure Consensus mechanisms which utilize what is called the Byzantine Fault Tolerance (BFT) model, the extra costs required to implement these mechanisms and comparatively lower performance in a geographically distributed setting has impeded widespread adoption. A new model recently proposed tries to find a cross between these models that is achieving security while paying no extra costs called the Cross Fault Tolerance (XFT). This thesis presents Elpis, a consensus mechanism which uses precisely this model that will secure the shared state from its core without modifications to the existing setups while delivering high performance and lower response times. We perform a comprehensive evaluation on AWS and demonstrate that Elpis achieves 3.5x over the state-of-the-art while improving response times by as much as 50%. Distributed Systems Fault Tolerance Byzantine Fault Tolerance State Machine Replication Multi-Leader Consensus Blockchain
9	Error isolation in distributed systems Behrens, Diogo 25 May 2016 (has links) (PDF) In distributed systems, if a hardware fault corrupts the state of a process, this error might propagate as a corrupt message and contaminate other processes in the system, causing severe outages. Recently, state corruptions of this nature have been observed surprisingly often in large computer populations, e.g., in large-scale data centers. Moreover, since the resilience of processors is expected to decline in the near future, the likelihood of state corruptions will increase even further. In this work, we argue that preventing the propagation of state corruption should be a first-class requirement for large-scale fault-tolerant distributed systems. In particular, we propose developers to target error isolation, the property in which each correct process ignores any corrupt message it receives. Typically, a process cannot decide whether a received message is corrupt or not. Therefore, we introduce hardening as a class of principled approaches to implement error isolation in distributed systems. Hardening techniques are (semi-)automatic transformations that enforce that each process appends an evidence of good behavior in the form of error codes to all messages it sends. The techniques “virtualize” state corruptions into more benign failures such as crashes and message omissions: if a faulty process fails to detect its state corruption and abort, then hardening guarantees that any corrupt message the process sends has invalid error codes. Correct processes can then inspect received messages and drop them in case they are corrupt. With this dissertation, we contribute theoretically and practically to the state of the art in fault-tolerant distributed systems. To show that hardening is possible, we design, formalize, and prove correct different hardening techniques that enable existing crash-tolerant designs to handle state corruption with minimal developer intervention. To show that hardening is practical, we implement and evaluate these techniques, analyzing their effect on the system performance and their ability to detect state corruptions in practice. hardware errors arbitrary state corruption data corruption error isolation distributed systems Byzantine fault tolerance ddc:004 rvk:ST 200
10	Byzantine Fault Tolerance for Nondeterministic Applications Chen, Bo January 2008 (has links) No description available. Computer Science Byzantine fault tolerance replica nondeterminism security replica consistency replication intrusion tolerance performance online poker game

Search results