Spelling suggestions: "subject:"most messages"" "subject:"most essages""
1 |
A Novel Low-Overhead Recovery Approach For Distributed SystemsKosaraju, Sundeepthi 01 December 2009 (has links)
In this work we have addressed the complex problem of recovery for concurrent failures in a distributed computing environment. We have proposed a new checkpointing and recovery approach that enables each process to restart from its recent checkpoint and therefore guarantees least amount of recomputation to be done after recovery. The proposed new approach deals effectively with orphan and lost messages. We have introduced two new ideas. The value of the common checkpointing interval is such that it requires to log only the messages sent in the recent checkpoints of the processes. The lost messages are always determined a priori by the initiator process in parallel to the normal distributed computation. Thereby, it does not delay the recovery approach in anyway.
|
Page generated in 0.0522 seconds