• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 2
  • 1
  • Tagged with
  • 12
  • 12
  • 12
  • 8
  • 8
  • 7
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Scalable Byzantine State Machine Replication: Designs, Techniques, and Implementations

Arun, Balaji 02 July 2021 (has links)
State machine replication (SMR) is one of the most widely studied and used methodology for building highly available distributed applications and services. SMR replicates a service across a set of computing hosts, and executes client operations on the replicas in an agreed- upon total order, ensuring linearizability of the replicated shared state. The problem of determining a total order reduces to one of computing consensus. State-of-the-art consensus protocols are inadequate for newer classes of applications such as Blockchains and for geographically distributed infrastructures. The widely used Crash Fault Tolerance (CFT) fault model of consensus protocols is prone to malicious and adversarial behaviors as well as non-crash faults such as software bugs. The Byzantine fault-tolerance (BFT) model and its trust-based variant, the hybrid model, permit stronger failure adversaries. However, state-of-the-art Byzantine and hybrid consensus protocols have performance limitations in geographically distributed environments: they designate a primary replica for proposing total-orders, which becomes a bottleneck and yields sub-optimal latencies for faraway clients. Additionally, they do not scale to hundreds of replicas and provide consistent performance as the system size grows. To overcome these limitations and develop highly scalable SMR solutions, this dissertation presents two leaderless consensus protocols, namely ezBFT and Dester, for the Byzantine and hybrid models, respectively. These protocols enable every replica to receive and order client commands. Additionally, they exchange command dependencies to collectively order commands without relying on a primary. Our experimental evaluations in a 7-node geographically distributed setup reveals that ezBFT improves client-side latency by as much as 40% over state-of-the-art BFT protocols including PBFT, FaB, and Zyzzyva. Dester, for the hybrid model, reduces latency by as much as 30% over ezBFT. Next, the dissertation presents a new paradigm called DQBFT for designing consensus protocols that can scale to hundreds of nodes in geographically distributed environments. Since leaderless protocols exchange command dependencies, they do not scale to hundreds of nodes. DQBFT overcomes this scalability limitation by decentralizing only the heavy task of replicating commands and centralizing the process of ordering the commands. While DQBFT can be used to enhance existing primary-based protocols, Destiny is a hybrid instantiation of the DQBFT paradigm using linear communication for better scalability than naive instantiations. Experimental evaluations in a 193-node geographically distributed setup reveal that Destiny achieves ≈ 3× better throughput and ≈50% better latency than state-of-the-art BFT protocols including Hotstuff, SBFT, and Hybster. Lastly, the dissertation presents two techniques for designing and implementing BFT protocols with reduced development costs. The dissertation presents Bumblebee, a methodology for manually transforming CFT protocols to tolerate Byzantine faults using trusted execution environments that are increasingly available in commodity hardware. Bumblebee is based on the observation that CFT protocols are incapable of tolerating non-malicous non-crash faults, but they are nevertheless deployed in many production systems. Bumblebee provides a Generic Algorithm that can represent protocols in both CFT and hybrid fault models, thus allowing easy construction of hybrid protocols using CFT protocols as baselines. The dissertation constructs hybrid instantiations of CFT protocols including Paxos, Raft, and M2Paxos. Experimental evaluations of the hybrid variants reveal that they perform at par with native hybrid protocols, but incur a 30% overhead over their CFT counterparts. Hybrid protocols rely on the integrity of trusted execution environments, which are increasingly subject to security exploits. To withstand exploits, the dissertation presents DuoBFT, a protocol that exposes both the BFT and hybrid fault models within a single consensus protocol. This enables consensus under both fault models within the same protocol and without additional redundancy, allowing DuoBFT to achieve the performance of hybrid protocols and the security of BFT protocols. Experimental evaluations reveal that DuoBFT achieves the best of both hybrid and BFT fault models with less than 10% overhead. / Doctor of Philosophy / Computers are ubiquitous; they perform some of the most complex and safety-critical tasks such as controlling aircraft, managing the financial markets, and maintaining sensitive medical records. The undeniable fact is that computers are faulty. They are prone to crash and can behave arbitrarily. Even the most robust computers such as those that are sent to the outer space eventually fail. External phenomenon such as power outages and network disruptions affect their operation. To make computing systems reliable, researchers and practitioners have long focussed on interconnecting many individual computers and programming them to effectively be duplicates of one another. This way when one computer fails in a system, the rest of the computers still ensure that the system as a whole is operational. Duplication requires that multiple computers effectively perform the same task. In order for multiple computers to perform the same task together, they should first agree on the task. More generally, since computing systems perform multiple tasks, they should agree on the sequence of tasks that they will individually perform and follow the agreement. This is what is known as the State Machine Replication technique. State Machine Replication (SMR) is a powerful technique that is applicable to numerous computing applications. Blockchain systems, the technology behind the cryptocurrencies such as Bitcoin and Ethereum, uses the SMR technique. In the context of Blockchain, the added challenge in that some of the computers involved in SMR can be programmed by adversarial parties and could act in a way to jeopardize the integrity of the whole system. For Bitcoin and Ethereum, this could mean embezzlement of hundreds or even millions of dollars worth of cryptocurrencies. Certain SMR systems are capable of tolerating such intrusions and ensure system integrity. Such systems are deemed to be Byzantine tolerant. This dissertation presents designs, techniques, and implementations of Byzantine State Machine Replication systems. The problems addressed in this dissertation are those that plague existing Byzantine SMR systems making them suboptimal for newer applications such as Blockchains. First, when computers that participate in SMR are spread around the world, their performance is dependent on the communication latencies between any two pair of computers. Second, the number of computers required is proportional the number of adversarial computers that need to be tolerated. Consequently, certain SMR systems for Blockchains require hundreds of computers to tolerate heavy adversarial behavior. Many existing SMR technique perform poorly under these scenarios. The techniques presented in this dissertation address various permutations of these challenges.
2

Replication of Concurrent Applications in a Shared Memory Multikernel

Wen, Yuzhong 19 July 2016 (has links)
State Machine Replication (SMR) has become the de-facto methodology of building a replication based fault-tolerance system. Current SMR systems usually have multiple machines involved, each of the machines in the SMR system acts as the replica of others. However having multiple machines leads to more cost to the infrastructure, in both hardware cost and power consumption. For tolerating non-critical CPU and memory failure that will not crash the entire machine, there is no need to have extra machines to do the job. As a result, intra-machine replication is a good fit for this scenario. However, current intra-machine replication approaches do not provide strong isolation among the replicas, which allows the faults to be propagated from one replica to another. In order to provide an intra-machine replication technique with strong isolation, in this thesis we present a SMR system on a multi-kernel OS. We implemented a replication system that is capable of replicating concurrent applications on different kernel instances of a multi-kernel OS. Modern concurrent application can be deployed on our system with minimal code modification. Additionally, our system provides two different replication modes that allows the user to switch freely according to the application type. With the evaluation of multiple real world applications, we show that those applications can be easily deployed on our system with 0 to 60 lines of code changes to the source code. From the performance perspective, our system only introduces 0.23\% to 63.39\% overhead compared to non-replicated execution. / Master of Science
3

A Low-latency Consensus Algorithm for Geographically Distributed Systems

Arun, Balaji 15 May 2017 (has links)
This thesis presents Caesar, a novel multi-leader Generalized Consensus protocol for geographically replicated systems. Caesar is able to achieve near-perfect availability, provide high performance - low latency and high throughput compared to the existing state-of-the- art, and tolerate replica failures. Recently, a number of state-of-the-art consensus protocols that implement the Generalized Consensus definition have been proposed. However, the major limitation of these existing approaches is the significant performance degradation when application workload produces conflicting requests. Caesar's main goal is to overcome this limitation by changing the way a fast decision is taken: its ordering protocol does not reject a fast decision for a client request if a quorum of nodes reply with different dependency sets for that request. It only switches to a slow decision if there is no chance to agree on the proposed order for that request. Caesar is able to achieve this using a combination of wait condition and logical time stamping. The effectiveness of Caesar is demonstrated through an evaluation study performed on Amazon's EC2 infrastructure using 5 geo-replicated sites. Caesar outperforms other multi-leader (e.g., EPaxos) competitors by as much as 1.7x in presence of 30% conflicting requests, and single-leader (e.g., Multi-Paxos) by as much as 3.5x. The protocol is also resistant to heavy client loads unlike existing protocols. / Master of Science / Today, there exists a plethora of online services (e.g. Facebook, Google) that serve millions of users daily. Usually, each of these services have multiple subcomponents that work cohesively to deliver a rich user experience. One vital component that is prevalent in these services is the one that maintains the shared state. One example of a shared state component is a database, which enables operations on structured data. Such shared states are replicated across multiple server nodes, and even across multiple data centers to guarantee availability, i.e., if a node fails, other nodes can still serve requests on the shared state; low-latency, i.e., placing the copy of the shared state in a datacenter closer to the users will reduce the time required to serve the users; and scalability, i.e., the bottleneck that a single server node cannot serve millions of concurrent requests can be alleviated by having multiple nodes serve users at the same time. These replicated shared states need to be kept consistent i.e. every copy of the shared state must be the same in all the replicated nodes, and maintaining this consistency requires that each of these replicating nodes communicate with each other and reach an agreement on the order in which the operations on the shared data should be applied. In that regard, this thesis proposes Caesar, a consensus protocol with the aforementioned guarantees that will ease the deployment of services that contain a shared state. It addresses the problem of performance degradation in existing approaches when the same part of the shared state are accessed by multiple users that are connected to different server nodes. The effectiveness of Caesar is demonstrated through an evaluation study performed by deploying the protocol on five of Amazon’s data centers around the world. Caesar outperforms the existing state-of-the-art by as much as 3.5x. Caesar is also resistant to heavy client loads unlike existing protocols.
4

Scalable state machine replication / Replicação escalável de máquina de estados

Bezerra, Carlos Eduardo Benevides January 2016 (has links)
Redundância provê tolerância a falhas. Um serviço pode ser executado em múltiplos servidores que se replicam uns aos outros, de maneira a prover disponibilidade do serviço em caso de falhas. Uma maneira de implementar um tal serviço replicado é através de técnicas como replicação de máquina de estados (SMR). SMR provê tolerância a falhas, ao mesmo tempo que é linearizável, isto é, clientes não são capazes de distinguir o comportamento do sistema replicado daquele de um sistema não replicado. No entanto, ter um sistema completamente replicado e linearizável vem com um custo, que é escalabilidade – por escalabilidade, queremos dizer que adicionar servidores ao sistema aumenta a sua vazão, pelo menos para algumas cargas de trabalho. Mesmo com uma configuração cuidadosa e usando otimizações que evitam que os servidores executem ações redundantes desnecessárias, em um determinado ponto a vazão de um sistema replicado com SMR não pode ser mais aumentada acrescentando-se servidores; na verdade, adicionar réplicas pode até degradar a sua performance. Uma maneira de conseguir escalabilidade é particionar o serviço e então permitir que partições trabalhem independentemente. Por outro lado, ter um sistema particionado, porém linearizável e com razoavelmente boa performance não é trivial, e esse é o tópico de pesquisa tratado aqui. Para permitir que sistemas escalem, ao mesmo tempo que se garante linearizabilidade, nós propomos as seguinte ideias: (i) Replicação Escalável de Máquina de Estados (SSMR), (ii) Multicast Atômico Otimista (Opt-amcast) e (iii) S-SMR Rápido (Fast-SSMR). S-SMR é um modelo de execução que permite que a vazão do sistema escale de maneira linear com o número de servidores, sem sacrificar consistência. Para reduzir o tempo de resposta dos comandos, nós definimos o conceito de Opt-amcast, que permite que mensagens sejam entregues duas vezes: uma entrega garante ordem atômica (entrega atômica), enquanto a outra é mais rápida, mas nem sempre garante ordem atômica (entrega otimista). A implementação de Opt-amcast que nós propomos nessa tese se chama Ridge, um protocolo que combina baixa latência com alta vazão. Fast-SSMR é uma extensão do S-SMR que utiliza a entrega otimista do Opt-amcast: enquanto um comando é ordenado de maneira atômica, pode-se fazer alguma pré-computação baseado na entrega otimista, reduzindo assim tempo de resposta. / Redundancy provides fault-tolerance. A service can run on multiple servers that replicate each other, in order to provide service availability even in the case of crashes. A way to implement such a replicated service is by using techniques like state machine replication (SMR). SMR provides fault tolerance, while being linearizable, that is, clients cannot distinguish the behaviour of the replicated system to that of a single-site, unreplicated one. However, having a fully replicated, linearizable system comes at a cost, namely, scalability—by scalability we mean that adding servers will always increase the maximum system throughput, at least for some workloads. Even with a careful setup and using optimizations that avoid unnecessary redundant actions to be taken by servers, at some point the throughput of a system replicated with SMR cannot be increased by additional servers; in fact, adding replicas may even degrade performance. A way to achieve scalability is by partitioning the service state and then allowing partitions to work independently. Having a partitioned, yet linearizable and reasonably performant service is not trivial, and this is the topic of research addressed here. To allow systems to scale, while at the same time ensuring linearizability, we propose and implement the following ideas: (i) Scalable State Machine Replication (S-SMR), (ii) Optimistic Atomic Multicast (Opt-amcast), and (iii) Fast S-SMR (Fast-SSMR). S-SMR is an execution model that allows the throughput of the system to scale linearly with the number of servers without sacrificing consistency. To provide faster responses for commands, we developed Opt-amcast, which allows messages to be delivered twice: one delivery guarantees atomic order (conservative delivery), while the other is fast, but not always guarantees atomic order (optimistic delivery). The implementation of Opt-amcast that we propose is called Ridge, a protocol that combines low latency with high throughput. Fast-SSMR is an extension of S-SMR that uses the optimistic delivery of Opt-amcast: while a command is atomically ordered, some precomputation can be done based on its fast, optimistically ordered delivery, improving response time.
5

Scalable state machine replication / Replicação escalável de máquina de estados

Bezerra, Carlos Eduardo Benevides January 2016 (has links)
Redundância provê tolerância a falhas. Um serviço pode ser executado em múltiplos servidores que se replicam uns aos outros, de maneira a prover disponibilidade do serviço em caso de falhas. Uma maneira de implementar um tal serviço replicado é através de técnicas como replicação de máquina de estados (SMR). SMR provê tolerância a falhas, ao mesmo tempo que é linearizável, isto é, clientes não são capazes de distinguir o comportamento do sistema replicado daquele de um sistema não replicado. No entanto, ter um sistema completamente replicado e linearizável vem com um custo, que é escalabilidade – por escalabilidade, queremos dizer que adicionar servidores ao sistema aumenta a sua vazão, pelo menos para algumas cargas de trabalho. Mesmo com uma configuração cuidadosa e usando otimizações que evitam que os servidores executem ações redundantes desnecessárias, em um determinado ponto a vazão de um sistema replicado com SMR não pode ser mais aumentada acrescentando-se servidores; na verdade, adicionar réplicas pode até degradar a sua performance. Uma maneira de conseguir escalabilidade é particionar o serviço e então permitir que partições trabalhem independentemente. Por outro lado, ter um sistema particionado, porém linearizável e com razoavelmente boa performance não é trivial, e esse é o tópico de pesquisa tratado aqui. Para permitir que sistemas escalem, ao mesmo tempo que se garante linearizabilidade, nós propomos as seguinte ideias: (i) Replicação Escalável de Máquina de Estados (SSMR), (ii) Multicast Atômico Otimista (Opt-amcast) e (iii) S-SMR Rápido (Fast-SSMR). S-SMR é um modelo de execução que permite que a vazão do sistema escale de maneira linear com o número de servidores, sem sacrificar consistência. Para reduzir o tempo de resposta dos comandos, nós definimos o conceito de Opt-amcast, que permite que mensagens sejam entregues duas vezes: uma entrega garante ordem atômica (entrega atômica), enquanto a outra é mais rápida, mas nem sempre garante ordem atômica (entrega otimista). A implementação de Opt-amcast que nós propomos nessa tese se chama Ridge, um protocolo que combina baixa latência com alta vazão. Fast-SSMR é uma extensão do S-SMR que utiliza a entrega otimista do Opt-amcast: enquanto um comando é ordenado de maneira atômica, pode-se fazer alguma pré-computação baseado na entrega otimista, reduzindo assim tempo de resposta. / Redundancy provides fault-tolerance. A service can run on multiple servers that replicate each other, in order to provide service availability even in the case of crashes. A way to implement such a replicated service is by using techniques like state machine replication (SMR). SMR provides fault tolerance, while being linearizable, that is, clients cannot distinguish the behaviour of the replicated system to that of a single-site, unreplicated one. However, having a fully replicated, linearizable system comes at a cost, namely, scalability—by scalability we mean that adding servers will always increase the maximum system throughput, at least for some workloads. Even with a careful setup and using optimizations that avoid unnecessary redundant actions to be taken by servers, at some point the throughput of a system replicated with SMR cannot be increased by additional servers; in fact, adding replicas may even degrade performance. A way to achieve scalability is by partitioning the service state and then allowing partitions to work independently. Having a partitioned, yet linearizable and reasonably performant service is not trivial, and this is the topic of research addressed here. To allow systems to scale, while at the same time ensuring linearizability, we propose and implement the following ideas: (i) Scalable State Machine Replication (S-SMR), (ii) Optimistic Atomic Multicast (Opt-amcast), and (iii) Fast S-SMR (Fast-SSMR). S-SMR is an execution model that allows the throughput of the system to scale linearly with the number of servers without sacrificing consistency. To provide faster responses for commands, we developed Opt-amcast, which allows messages to be delivered twice: one delivery guarantees atomic order (conservative delivery), while the other is fast, but not always guarantees atomic order (optimistic delivery). The implementation of Opt-amcast that we propose is called Ridge, a protocol that combines low latency with high throughput. Fast-SSMR is an extension of S-SMR that uses the optimistic delivery of Opt-amcast: while a command is atomically ordered, some precomputation can be done based on its fast, optimistically ordered delivery, improving response time.
6

Scalable state machine replication / Replicação escalável de máquina de estados

Bezerra, Carlos Eduardo Benevides January 2016 (has links)
Redundância provê tolerância a falhas. Um serviço pode ser executado em múltiplos servidores que se replicam uns aos outros, de maneira a prover disponibilidade do serviço em caso de falhas. Uma maneira de implementar um tal serviço replicado é através de técnicas como replicação de máquina de estados (SMR). SMR provê tolerância a falhas, ao mesmo tempo que é linearizável, isto é, clientes não são capazes de distinguir o comportamento do sistema replicado daquele de um sistema não replicado. No entanto, ter um sistema completamente replicado e linearizável vem com um custo, que é escalabilidade – por escalabilidade, queremos dizer que adicionar servidores ao sistema aumenta a sua vazão, pelo menos para algumas cargas de trabalho. Mesmo com uma configuração cuidadosa e usando otimizações que evitam que os servidores executem ações redundantes desnecessárias, em um determinado ponto a vazão de um sistema replicado com SMR não pode ser mais aumentada acrescentando-se servidores; na verdade, adicionar réplicas pode até degradar a sua performance. Uma maneira de conseguir escalabilidade é particionar o serviço e então permitir que partições trabalhem independentemente. Por outro lado, ter um sistema particionado, porém linearizável e com razoavelmente boa performance não é trivial, e esse é o tópico de pesquisa tratado aqui. Para permitir que sistemas escalem, ao mesmo tempo que se garante linearizabilidade, nós propomos as seguinte ideias: (i) Replicação Escalável de Máquina de Estados (SSMR), (ii) Multicast Atômico Otimista (Opt-amcast) e (iii) S-SMR Rápido (Fast-SSMR). S-SMR é um modelo de execução que permite que a vazão do sistema escale de maneira linear com o número de servidores, sem sacrificar consistência. Para reduzir o tempo de resposta dos comandos, nós definimos o conceito de Opt-amcast, que permite que mensagens sejam entregues duas vezes: uma entrega garante ordem atômica (entrega atômica), enquanto a outra é mais rápida, mas nem sempre garante ordem atômica (entrega otimista). A implementação de Opt-amcast que nós propomos nessa tese se chama Ridge, um protocolo que combina baixa latência com alta vazão. Fast-SSMR é uma extensão do S-SMR que utiliza a entrega otimista do Opt-amcast: enquanto um comando é ordenado de maneira atômica, pode-se fazer alguma pré-computação baseado na entrega otimista, reduzindo assim tempo de resposta. / Redundancy provides fault-tolerance. A service can run on multiple servers that replicate each other, in order to provide service availability even in the case of crashes. A way to implement such a replicated service is by using techniques like state machine replication (SMR). SMR provides fault tolerance, while being linearizable, that is, clients cannot distinguish the behaviour of the replicated system to that of a single-site, unreplicated one. However, having a fully replicated, linearizable system comes at a cost, namely, scalability—by scalability we mean that adding servers will always increase the maximum system throughput, at least for some workloads. Even with a careful setup and using optimizations that avoid unnecessary redundant actions to be taken by servers, at some point the throughput of a system replicated with SMR cannot be increased by additional servers; in fact, adding replicas may even degrade performance. A way to achieve scalability is by partitioning the service state and then allowing partitions to work independently. Having a partitioned, yet linearizable and reasonably performant service is not trivial, and this is the topic of research addressed here. To allow systems to scale, while at the same time ensuring linearizability, we propose and implement the following ideas: (i) Scalable State Machine Replication (S-SMR), (ii) Optimistic Atomic Multicast (Opt-amcast), and (iii) Fast S-SMR (Fast-SSMR). S-SMR is an execution model that allows the throughput of the system to scale linearly with the number of servers without sacrificing consistency. To provide faster responses for commands, we developed Opt-amcast, which allows messages to be delivered twice: one delivery guarantees atomic order (conservative delivery), while the other is fast, but not always guarantees atomic order (optimistic delivery). The implementation of Opt-amcast that we propose is called Ridge, a protocol that combines low latency with high throughput. Fast-SSMR is an extension of S-SMR that uses the optimistic delivery of Opt-amcast: while a command is atomically ordered, some precomputation can be done based on its fast, optimistically ordered delivery, improving response time.
7

Pipelined Byzantine Fault Tolerance and Applications

Adithya Bhat (17583018) 07 December 2023 (has links)
<p dir="ltr">Practically, Byzantine faults are not assumed in cloud applications. Byzantine fault-tolerance adds significant cryptographic, communication, throughput, and latency overheads to applications, contributing to the resistance towards its widespread adoption. Existing Byzantine-fault tolerant protocols focus on optimal latency or optimal communication while ignoring the throughput and cryptographic overheads.</p><p dir="ltr">In this thesis, we explore pipelining for Byzantine fault-tolerant applications. Pipelining tasks is a common optimization in distributed systems that involves executing tasks in stages. The idea is that instead of executing a task in an iteration as an atomic unit, we split the execution into stages and execute all stages of <i>different</i> tasks per iteration. We observe significant performance benefits if executing later stages of a task helps other tasks in earlier stages, saving effort in each stage. The length of the pipeline, i.e., the number of stages, determines the latency of an individual task. However, if the pipeline improves the execution of every stage enough, then the latency improves.</p><p dir="ltr">We primarily explore three Byzantine Fault Tolerant (BFT) applications with pipelining: (i) unique chain-based State Machine Replication protocols: <i>Apollo</i>, <i>Artemis</i>, <i>Leto</i>, and <i>Zeus</i>, and (ii) energy-efficient State Machine Replication: <i>EESMR</i>. (iii) random beacon protocols: <i>GRandPiper</i>, <i>BRandPiper</i>, and <i>OptRand</i>. We design them with a pipeline-first approach to improve the throughput, cryptographic, and communication costs at every stage of the pipeline. With respect to latency, we show (i) pipelined SMR protocols where our pipeline stages have constant cryptographic and linear communication costs allowing our protocols to outperform state-of-the-art BFT-SMR protocols in throughput. (ii) pipelined SMR protocols with techniques to make each stage of the pipeline independent, thus achieving demonstrable energy efficiency while allowing an unbounded number of non-interactive parallel proposals. (iii) reduced latencies for reconfiguration-friendly random beacons by using two pipelines: an SMR pipeline to commit and a beacon pipeline to produce random numbers and decoupling the two pipelines thereby removing the impact of the high-latency SMR pipeline on the latency of the randomness output by the system. </p>
8

A Scalable Leader Based Consensus Algorithm

Gulati, Ishaan 10 August 2023 (has links)
Present-day commonly used systems like Cassandra, Spanner, and CockroachDB require high availability and strict consistency guarantees. High availability is attained through redundancy. In the field of computing, redundancy is attained through state machine repli- cation. Protocols like Raft, Multi-Paxos, ZAB, or other variants of Paxos are commonly used to achieve state machine replication. These protocols choose one of the processes from multiple processes running on various machines in a distributed setting as the leader. The leader is responsible for client interactions, replicating client operations on all the followers, and maintaining a consistent view across the system. In these protocols, the leader is more loaded than other nodes or followers in the system, making the leader a significant scalabil- ity bottleneck for multi-datacenter and edge deployments. The overall commit throughput and latency are further exacerbated in majority agreement with the hardware and network heterogeneity. This work aims to reduce the load on the leader by using reduced dynamic latency-aware flexible quorums while maintaining strict correctness guarantees like linearizability. In this thesis, we implement dynamic reduced-size commit quorums to reduce the leader’s load and improve throughput and latency, called FDRaft. The commit quorums are computed based on an exponentially moving weighted average of the followers’ time to respond to the leader, accounting for the heterogeneity in hardware and network. The reduced commit quorum requires a bigger election quorum, but elections rarely happen, and a single leader can serve for significant durations. We evaluate this protocol using a key-value store built on FDRaft and Raft and compare multi-datacenter and edge deployments. The evaluation shows 2x improved throughput and around 55% improved latency over Raft during normal operations and 45% improvement over Raft with vanilla flexible-quorums under failure conditions. / M.S. / In our day-to-day life, we rely heavily on different internet applications, be it Instagram for sharing pictures, Amazon for our shopping, Doordaash for our food orders, Spotify for listening to music, or Uber for traveling. These applications share many commonalities, like the scale at which they operate, maintaining strict latency guarantees, high availability to serve the users, and using databases to maintain shared states. The data is replicated across multiple servers to provide fault tolerance against failures. The replication across multiple servers is achieved through state-machine replication. In state-machine replication, multiple servers start with the same initial state and perform operations in the same order to reach the same final state. This process of replication in computing is achieved through a consensus algorithm. Con- sensus means agreement, and consensus algorithms are used to reach an agreement for a particular value. Raft, Multi-Paxos, or any other variant of Paxos are the commonly used consensus algorithms to achieve agreement on a particular value in a distributed setting. In these algorithms, one of the servers is chosen as the leader responsible for client interactions, replicating and maintaining the same state across all the servers, even when faced with server and network failures. Every time the leader receives a client operation, it starts the consensus process by forwarding the client request to all the servers and committing the client request after receiving an agreement from the majority. As the leader does most of the work, it is more loaded than other servers and becomes a significant scalability bottleneck. The leader bottleneck becomes more evident in multi-datacenters and edge deployments. The hardware and network heterogeneity also severely affects the overall commit throughput and latency in majority agreement. In this thesis, we reduce the load on the leader by building a smaller-sized dynamic commit quorum with latency-aware server selection based on an exponentially weighted moving av- erage of the followers’ response time to the leader’s requests without compromising safety and liveness properties. Our design also provides a higher efficiency for throughput and commit latency. We evaluate this protocol against multiple workloads and failure conditions and find that it outperforms Raft by 2x in terms of throughput and around 55% in latency over Raft during normal operations. It also shows improvement in throughput and latency by 45% over Raft with vanilla flexible-quorums under failure conditions.
9

Generalized Consensus for Practical Fault-Tolerance

Garg, Mohit 07 September 2018 (has links)
Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols. / Master of Science / Online services like Facebook, Twitter, Netflix and Spotify to cloud services like Google and Amazon serve millions of users which include individuals as well as organizations. They use many distributed technologies to deliver a rich experience. The distributed nature of these technologies has removed geographical barriers to accessing data, services, software, and hardware. An essential aspect of these technologies is the concept of the shared state. Distributed databases with multiple replicated data nodes are an example of this shared state. Maintaining replicated data nodes provides several advantages such as (1) availability so that in case one node goes down the data can still be accessed from other nodes, (2) quick response times, by placing data nodes closer to the user, the data can be obtained quickly, (3) scalability by enabling multiple users to access different nodes so that a single node does not cause bottlenecks. To maintain this shared state some mechanism is required to maintain consistency, that is the copies of these shared state must be identical on all the data nodes. This mechanism is called Consensus, and several such mechanisms exist in practice today which use the Crash Fault Tolerance (CFT). The CFT model implies that these mechanisms provide consistency in the presence of nodes crashing. While the state-of-the-art for security has moved from assuming a trusted environment inside a firewall to a perimeter-less and semi-trusted environment with every service living on the internet, only the application layer is required to be secured while the core is built just with an idea of crashes in mind. While there exists comprehensive research on secure Consensus mechanisms which utilize what is called the Byzantine Fault Tolerance (BFT) model, the extra costs required to implement these mechanisms and comparatively lower performance in a geographically distributed setting has impeded widespread adoption. A new model recently proposed tries to find a cross between these models that is achieving security while paying no extra costs called the Cross Fault Tolerance (XFT). This thesis presents Elpis, a consensus mechanism which uses precisely this model that will secure the shared state from its core without modifications to the existing setups while delivering high performance and lower response times. We perform a comprehensive evaluation on AWS and demonstrate that Elpis achieves 3.5x over the state-of-the-art while improving response times by as much as 50%.
10

Vers des protocoles de tolérance aux fautes byzantines efficaces et robustes / Towards efficient and robust byzantine fault tolerance protocols

Perronne, Lucas 08 December 2016 (has links)
Au cours de la dernière décennie, l'informatique en nuage (Cloud Computing) suscita un important changement de paradigme dans de nombreux systèmes d'information. Ce nouveau paradigme s'illustre principalement par la délocalisation de l'infrastructure informatique hors du parc des entreprises, permettant ainsi une utilisation des ressources à la demande. La prise en charge de serveurs locaux s'est donc vue peu à peu remplacée par la location de serveurs distants, auprès de fournisseurs spécialisés tels que Google, Amazon, Microsoft. Afin d'assurer la pérennité d'un tel modèle économique, il apparaît nécessaire de fournir aux utilisateurs diverses garanties relatives à la sécurité, la disponibilité, ou encore la fiabilité des ressources mises à disposition. Ces facteurs de qualité de service (QoS pour Quality of Service) permettent aux fournisseurs et aux utilisateurs de s'accorder sur le niveau de prestation escompté. En pratique, les serveurs mis à disposition des utilisateurs doivent épisodiquement faire face à des fautes arbitraires (ou byzantines). Il s'agit par exemple de ruptures temporaires du réseau, du traitement de messages corrompus, ou encore d’arrêts inopinés. Le contexte d'informatique en nuage s'est vu néanmoins propice à l'émergence de technologies telles que la virtualisation ou la réplication de machines à états. De telles technologies permettent de pallier efficacement à l’occurrence de pannes via l'implémentation de protocoles de tolérance aux pannes.La tolérance aux fautes byzantines (BFT pour Byzantine Fault Tolerance) est un domaine de recherche implémentant les concepts de réplication de machines à états, qui vise à assurer la continuité et la fiabilité des services en présence de comportements arbitraires. Afin de répondre à cette problématique, de nombreux protocoles furent proposés. Ceux-ci se doivent d'être efficaces afin de masquer le surcoût lié à la réplication, mais également robustes afin de maintenir un niveau de performance élevé en présence de fautes. Nous constatons d'abord qu'il est délicat de relever ces deux défis à la fois: les protocoles actuels sont soit conçus pour être efficaces au détriment de leur robustesse, soit pour être robustes au détriment de leur efficacité. Cette thèse se focalise autour de cette problématique, l'objectif étant de fournir les instruments nécessaires à la conception de protocoles à la fois robustes et efficaces.Notre intérêt se porte principalement vers deux types de dénis de service liés à la gestion des requêtes. Le premier de ces dénis de service est causé par la corruption partielle d'une requête lors de son émission par un client. Le deuxième est causé par l'abandon intentionnel d'une requête lors de sa réception par un réplica. Afin de faire face efficacement à ces deux comportements byzantins, plusieurs mécanismes dédiés furent implémentés dans les protocoles de BFT robustes. En pratique, ces mécanismes engendrent d'importants surcoûts, ce qui nous permet d'introduire notre première contribution: la définition de plusieurs principes de conception génériques destinés à réduire ces surcoûts tout en assurant un niveau de robustesse équivalent.La seconde contribution de cette thèse illustre ER-PBFT, un nouveau protocole implémentant ces principes de conception sur PBFT, la référence en matière de tolérance aux fautes byzantines. Nous démontrons l'efficacité de notre nouvelle politique de robustesse, à la fois en présence de comportements byzantins mais également lors de scénarios sans faute.La troisième contribution illustre ER-COP, un nouveau protocole orienté à la fois vers l’efficacité et la robustesse, implémentant nos principes de conception sur COP, le protocole de BFT fournissant les meilleures performances à l'heure actuelle dans un environnement sans faute. Nous évaluons le surcoût engendré par l'intégration de notre politique de robustesse, et nous démontrons la capacité de ER-COP à tolérer l'occurrence de comportements byzantins. / Over the last decade, Cloud computing instigated an important switch of paradigm in numerous information systems. This new paradigm is mainly illustrated by the re-location of the whole IT infrastructures out of companies’ warehouses. The use of local servers has thus being replaced by remote ones, rented from dedicated providers such as Google, Amazon, Microsoft.In order to ensure the sustainability of this economic model, it appears necessary to provide several guarantees to users, related to the security, availability, or even reliability of the proposed resources. Such quality of service (QoS) factors allow providers and users to reach an agreement on the expected level of dependability. Practically, the proposed servers must episodically cope with arbitrary faults (also called byzantine faults), such as incorrect/corrupted messages, servers crashes, or even network failures. Nevertheless, the Cloud computing environment encouraged the emergence of technologies such as virtualization or state machine replication. These technologies allow cloud providers to efficiently face the occurrences of faults through the implementation of fault tolerance protocols.Byzantine Fault Tolerance (BFT) is a research area involving state machine replication concepts, and aiming at ensuring continuity and reliability of hosted services in presence of any kind of arbitrary behaviors. In order to handle such threat, numerous protocols were proposed. These protocols must be efficient in order to counterbalance the extra cost of replication, and robust in order to lower the impact of byzantine behaviors on the system performance. We first noticed that tackling both these concerns at the same time is difficult: current protocols are either designed to be efficient at the expense of their robustness, or robust at the expense of their efficiency. We tackle this specific problem in this thesis, our goal being to provide the required tools to design both efficient and robust BFT protocols.Our focus is mainly dedicated to two types of denial-of-service attacks involving requests management. The first one is caused by the partial corruption of a request transmitted by a client. The second one is caused by the intentional drop of a request upon receipt. In order to face efficiently both these byzantine behaviors, several mechanisms were integrated in robust BFT protocols. In practice, these mecanisms involve high overheads, and thus lead to the significant performance drop of robust protocols compared to efficien ones. This assessment allows us to introduce our first contribution: the definition of several generic design principles, applicable to numerous existing BFT protocols, and aiming at reducing these overheads while maintaining the same level of robustness.The second contribution introduces ER-PBFT, a new protocol implementing these design principles on PBFT, the reference in terms of byzantine fault tolerance. We demonstrate the efficiency of our new robustness policy, both in fault-free scenarios and in presence of byzantine behaviors.The third contribution highlights ER-COP, a new BFT protocol dedicated to both efficiency and robustness, implementing our design principles on COP, the BFT protocol providing for now the best performances in a fault-free environment. We evaluate the additional cost introduced by our robustness policy, and we demonstrate ER-COP's ability to handle byzantine behaviors.

Page generated in 0.0978 seconds