Return to search

Reducing Communication Overhead and Computation Costs in a Cloud Network by Early Combination of Partial Results

This thesis describes a method of reducing communication overheads within the MapReduce infrastructure of a cloud computing environment. MapReduce is an framework for parallelizing the processing on massive data systems stored across a
distributed computer network. One of the benefits of MapReduce is that the computation is usually performed on a computer (node) that holds the data file. Not
only does this approach achieve parallelism, but it also benefits from a characteristic common to many applications: that the answer derived from a computation is often smaller than the size of the input file.
Our new method benefits also from this feature. We delay the transmission of individual answers out a given node, so as to allow these answers to be combined locally, first. This combination has two advantages. First, it allows for a further reduction in the amount of data to ultimately transmit. And second, it allows for additional computation across files (such as a merge-sort).
There is a limit to the benefit of delaying transmission, however, because the reducer stage of MapReduce cannot begin its work until the nodes transmit their answers. We therefore consider a mechanism to allow the user to adjust the amount of delay before data transmission out of each node.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0822111-220155
Date22 August 2011
CreatorsHuang, Jun-neng
ContributorsChung-nan Lee, Steve W.Haga, CHUN-HUNG RICHARD LIN
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0822111-220155
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0023 seconds