Return to search

A Novel Low-Overhead Recovery Approach For Distributed Systems

In this work we have addressed the complex problem of recovery for concurrent failures in a distributed computing environment. We have proposed a new checkpointing and recovery approach that enables each process to restart from its recent checkpoint and therefore guarantees least amount of recomputation to be done after recovery. The proposed new approach deals effectively with orphan and lost messages. We have introduced two new ideas. The value of the common checkpointing interval is such that it requires to log only the messages sent in the recent checkpoints of the processes. The lost messages are always determined a priori by the initiator process in parallel to the normal distributed computation. Thereby, it does not delay the recovery approach in anyway.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-1148
Date01 December 2009
CreatorsKosaraju, Sundeepthi
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses

Page generated in 0.0977 seconds