Spelling suggestions: "subject:"replicated state cachine"" "subject:"replicated state amachine""
1 |
UpRight fault toleranceClement, Allen Grogan 13 November 2012 (has links)
Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications. / text
|
2 |
Distributed Consensus: Performance Comparison of Paxos and Raft / Distribuerad Konsensus: Prestandajämförelse mellan Paxos och RaftNg, Harald January 2020 (has links)
With the growth of the internet, distributed systems have become increasingly important in order to provide more available and scalable applications. Con- sensus is a fundamental problem in distributed systems where multiple pro- cesses have to agree on the same proposed value in the presence of partial failures. Distributed consensus allows for building various applications such as lock services, configuration manager services or distributed databases.Two well-known consensus algorithms for building distributed logs are Multi-Paxos and Raft. Multi-Paxos was published almost three decades before Raft and gained a lot of popularity. However, critics of Multi-Paxos consider it difficult to understand. Raft was therefore published with the motivation of being an easily understood consensus algorithm. The Raft algorithm shares similar characteristics with a practical version of Multi-Paxos called Leader- based Sequence Paxos. However, the algorithms differ in important aspects such as leader election and reconfiguration.Existing work mainly compares Multi-Paxos and Raft in theory, but there is a lack of performance comparisons in practice. Hence, prototypes of Leader- based Sequence Paxos and Raft have been designed and implemented in this thesis. The prototypes were implemented using the Rust programming lan- guage and the message-passing framework Kompact and then benchmarked in real-world scenarios to compare the performance of Leader-based Sequence Paxos and Raft.The results show that Leader-based Sequence Paxos and Raft have simi- lar performance in geographically distributed deployments. However, the un- predictable leader election in Raft could greatly affect the performance if the elected leader is in an undesired location. In our experiments, the location of the Raft leader affected the average throughput by up to 35%. Furthermore, the results indicate that implementation details could have a significant impact on performance even in the parts where the algorithms are similar. By batch- ing messages more efficiently, Leader-based Sequence Paxos achieved up to 17% higher average throughput than Raft. / Med tillväxten av internet har distribuerade system blivit allt mer viktiga för att bygga mer tillgängliga och skalbara applikationer. Konsensus är ett funda- mentalt problem i distribuerade system där flera processer ska komma överens om samma föreslagna värde, samtidigt som partiella fel kan ske. Distribuerad konsensus kan appliceras till olika användningsomården som låstjänster, kon- figurationshanterare och distribuerade databaser.Två välkända konsensusalgoritmer för att bygga distribuerade loggar är Multi-Paxos och Raft. Multi-Paxos publicerades nästintill tre årtionden före Raft och blev populär. Men kritiker av Multi-Paxos anser att algoritmen är svår att förstå. Av denna anledning publicerades Raft med motivationen att vara en konsensusalgoritm som är enkel att förstå. Raft delar likheter med Leader- based Sequence Paxos, en praktisk version av Multi-Paxos. Dock skiljer sig algoritmerna i viktiga aspekter som leaderval och rekonfigurering.Befintliga arbeten jämför i huvudsak Multi-Paxos och Raft i teorin, men det saknas jämförelse av prestandan i praktiken. Av denna anledning har pro- totyper av Leader-based Sequence Paxos och Raft blivit designade och imple- menterade i denna avhandling. Dessa prototyper implementerades i program- meringsspråket Rust och message-passing ramverket Kompact, som sedan tes- tades i verkliga situationer för att jämföra Leader-based Sequence Paxos och Raft.Resultaten visar att Leader-based Sequence Paxos och Raft har liknande prestanda i geografiskt distribuerade sammanhang. Dock kan det oförutsäga- bara ledarvalet i Raft påverka prestandan avsevärt ifall den valde ledaren befin- ner sig på en oönskad plats. I våra experiment påverkade Raft ledarens plats den genomsnittliga kapaciteten med upp till 35%. Resultaten visar även att implementationsdetaljer kan ha en signifikant effekt på prestandan även i de delar där algoritmerna är liknande. Genom att sammanfoga meddelanden mer effektivt uppnådde Leader-based Sequence Paxos 17% högre genomsnittlig kapacitet än Raft.
|
Page generated in 0.0535 seconds