Hadoop was developed as an open-source software framework that leveraged initially the MapReduce programming model and therefore was able to efficiently analyse and process large datasets. At the core of Hadoop is the Hadoop distributed file system or HDFS, which is used as the default storage across the cluster. Hadoop can also be used with other types of storage, with or without HDFS, such as Amazon S3, Windows Azure Storage Blobs, GlusterFS, Tachyon etc. This thesis focuses on Tachyon, a distributed file system that claims to enable reliable data sharing at memory speed across cluster computing frameworks. We benchmark and evaluate HDFS with and without Tachyon in regards to performance. To do so we used TestDFSIO as a benchmark to simulate different MapReduce workloads and an in-production Spark job from Spotify. Tachyon's different writetypes were also put to the test and evaluated. To see how cloud solutions compare, we perform the same evaluations of Tachyon over Google Cloud Storage.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-189571 |
Date | January 2016 |
Creators | Kerkinos, Ioannis |
Publisher | KTH, Skolan för informations- och kommunikationsteknik (ICT) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-ICT-EX ; 2016:12 |
Page generated in 0.0073 seconds