The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.
Identifer | oai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:cs_etds-1094 |
Date | 01 January 2019 |
Creators | Wu, Yuanyuan |
Publisher | UKnowledge |
Source Sets | University of Kentucky |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations--Computer Science |
Page generated in 0.0018 seconds