Return to search

A Distributed Approach to EpiFast using Apache Spark

EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches. / Master of Science

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/55272
Date04 August 2015
CreatorsKannan, Vijayasarathy
ContributorsComputer Science, Marathe, Madhav Vishnu, Marathe, Achla, Vullikanti, Anil Kumar S., Chen, Jiangzhuo
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0054 seconds