Return to search

Evaluation and benchmarking of Tachyon as a memory-centric distributed storage system for Apache Hadoop

Hadoop was developed as an open-source software framework that leveraged initially the MapReduce programming model and therefore was able to efficiently analyse and process large datasets. At the core of Hadoop is the Hadoop distributed file system or HDFS, which is used as the default storage across the cluster. Hadoop can also be used with other types of storage, with or without HDFS, such as Amazon S3, Windows Azure Storage Blobs, GlusterFS, Tachyon etc. This thesis focuses on Tachyon, a distributed file system that claims to enable reliable data sharing at memory speed across cluster computing frameworks. We benchmark and evaluate HDFS with and without Tachyon in regards to performance. To do so we used TestDFSIO as a benchmark to simulate different MapReduce workloads and an in-production Spark job from Spotify. Tachyon's different writetypes were also put to the test and evaluated. To see how cloud solutions compare, we perform the same evaluations of Tachyon over Google Cloud Storage.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-189571
Date January 2016
CreatorsKerkinos, Ioannis
PublisherKTH, Skolan för informations- och kommunikationsteknik (ICT)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTRITA-ICT-EX ; 2016:12

Page generated in 0.0073 seconds