Return to search

Sharing the love : a generic socket API for Hadoop Mapreduce

Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively

Identiferoai:union.ndltd.org:pacific.edu/oai:scholarlycommons.pacific.edu:uop_etds-1771
Date01 January 2011
CreatorsYee, Adam J.
PublisherScholarly Commons
Source SetsUniversity of the Pacific
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceUniversity of the Pacific Theses and Dissertations

Page generated in 0.0021 seconds