Global ETD Search

Return to search

Analyzing Spark Performance on Spot Instances

Amazon Spot Instances provide inexpensive service for high-performance computing. With spot instances, it is possible to get at most 90% off as discount in costs by bidding spare Amazon Elastic Computer Cloud (Amazon EC2) instances. In exchange for low cost, spot instances bring the reduced reliability onto the computing environment, because this kind of instance could be revoked abruptly by the providers due to supply and demand, and higher-priority customers are first served.
To achieve high performance on instances with compromised reliability, Spark is applied to run jobs. In this thesis, a wide set of spark experiments are conducted to study its performance on spot instances. Without stateful replicating, Spark suffers from cascad- ing rollback and is forced to regenerate these states for ad hoc practices repeatedly. Such downside leads to discussion on trade-off between compatible slow checkpointing and regenerating on rollback and inspires us to apply multiple fault tolerance schemes. And Spark is proven to finish a job only with proper revocation rate. To validate and evaluate our work, prototype and simulator are designed and implemented. And based on real history price records, we studied how various checkpoint write frequencies and bid level affect performance. In case study, experiments show that our presented techniques can lead to ~20% shorter completion time and ~25% lower costs than those cases without such techniques. And compared with running jobs on full-price instance, the absolute saving in costs can be ~70%.

cloud computing infrastructure

Computer and Systems Architecture

Identifer	oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:masters_theses_2-1623
Date	27 October 2017
Creators	Tian, Jiannan
Publisher	ScholarWorks@UMass Amherst
Source Sets	University of Massachusetts, Amherst
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses

Page generated in 0.002 seconds

Analyzing Spark Performance on Spot Instances

Description

Links & Downloads

Tags

Additional Fields