Return to search

Fusion-based Hadoop MapReduce job for fault tolerance in distributed systems

Standard recovery solution on a failed task in Hadoop systems is to execute the task again. After retrying for a configured number of times, it is marked as failure. With significant amount of data, complicated Map and Reduce functions, recovering corrupted or unfinished data from a failed job can be more efficient than re-executing the same job. This paper is an extension of [1] by applying fusion-based technique [7][8] in Hadoop MapReduce tasks execution to enhance its fault tolerance. Multiple data sets are executed through Hadoop MapReduce with and without fusion in various pre-defined failure scenarios for comparison. As the complexity of the Map and Reduce function relative to the Recover function increases, it becomes more efficient to utilize fusion and users can tolerate faults by incurring less than ten percent of extra execution time. / text

Identiferoai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/22605
Date09 December 2013
CreatorsHo, Iat-Kei
Source SetsUniversity of Texas
Languageen_US
Detected LanguageEnglish
Formatapplication/pdf

Page generated in 0.0214 seconds