Global ETD Search

Return to search

Assessing Apache Spark Streaming with Scientific Data

Processing real-world data requires the ability to analyze data in real-time. Data processing engines like Hadoop come short when results are needed on the fly. Apache Spark's streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data. To showcase and assess the ability of Spark various metrics were designed and operated using data collected from the USGODAE data catalog. The latency of streaming in Apache Spark was measured and analyzed against many nodes in the cluster. Scalability was monitored by adding and removing nodes in the middle of a streaming job. Fault tolerance was verified by stopping nodes in the middle of a job and making sure that the job was rescheduled and completed on other node/s. A full stack application was designed that would automate data collection, data processing and visualizing the results. Google Maps API was used to visualize results by color coding the world map with values from various analytics.

Other Computer Sciences

Identifer	oai:union.ndltd.org:uno.edu/oai:scholarworks.uno.edu:td-3668
Date	06 August 2018
Creators	Dahal, Janak
Publisher	ScholarWorks@UNO
Source Sets	University of New Orleans
Detected Language	English
Type	text
Format	application/pdf
Source	University of New Orleans Theses and Dissertations

Page generated in 0.0025 seconds

Assessing Apache Spark Streaming with Scientific Data

Description

Links & Downloads

Tags

Additional Fields