Return to search

Consul: A communication substrate for fault-tolerant distributed programs.

As human dependence on computing technology increases, so does the need for computer system dependability. This dissertation introduces Consul, a communication substrate designed to help improve system dependability by providing a platform for building fault-tolerant, distributed systems based on the replicated state machine approach. The key issues in this approach--ensuring replica consistency and reintegrating recovering replicas--are addressed in Consul by providing abstractions called fault-tolerant services. These include a broadcast service to deliver messages to a collection of processes reliably and in some consistent order, a membership service to maintain a consistent system-wide view of which processes are functioning and which have failed, and a recovery service to recover a failed process. Fault-tolerant services are implemented in Consul by a unified collection of protocols that provide support for managing communication, redundancy, failures, and recovery in a distributed system. At the heart of Consul is Psync, a protocol that provides for multicast communication based on a context graph that explicitly records the partial (or causal) order of messages. This graph also serves as the basis for novel algorithms used in the ordering, membership, and recovery protocols. The ordering protocol combines the semantics of the operations encoded in messages with the partial order provided by Psync to increase the concurrency of the application. Similarly, the membership protocol exploits the partial ordering to allow different processes to conclude that a failure has occurred at different times relative to the sequence of messages received, thereby reducing the amount of synchronization required. The recovery protocol combines checkpointing with the replay of messages stored in the context graph to recover the state of a failed process. Moreover, this collection of protocols is implemented in a highly-configurable manner, thus allowing a system builder to easily tailor an instance of Consul from this collection of building-block protocols. Consul is built in the x-Kernel and executes standalone on a collection of Sun 3 work-stations. Initial testing and performance studies have been done using two applications: a replicated directory and a distributed wordgame. These studies show that the semantic based order is more efficient than a total order in many situations, and that the overhead imposed by the checkpointing, membership, and recovery protocols is insignificant.

Identiferoai:union.ndltd.org:arizona.edu/oai:arizona.openrepository.com:10150/185824
Date January 1992
CreatorsMishra, Shivakant.
ContributorsSchlichting, Richard D., Peterson, Larry L., Snodgrass, Richard T., Gay, David, Leonard, John
PublisherThe University of Arizona.
Source SetsUniversity of Arizona
LanguageEnglish
Detected LanguageEnglish
Typetext, Dissertation-Reproduction (electronic)
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Page generated in 0.0022 seconds