Return to search

A Scalable Architecture for Simplifying Full-Range Scientific Data Analysis

According to a recent exascale roadmap report, analysis will be the limiting factor in gaining insight from exascale data. Analysis problems that must operate on the full range of a dataset are among the most difficult. Some of the primary challenges in this regard come from disk access, data managment, and programmability of analysis tasks on exascale architectures. In this dissertation, I have provided an architectural approach that simplifies and scales data analysis on supercomputing architectures while masking parallel intricacies to the user. My architecture has three primary general contributions: 1) a novel design pattern and implmentation for reading multi-file and variable datasets, 2) the integration of querying and sorting as a way to simplify data-parallel analysis tasks, and 3) a new parallel programming model and system for efficiently scaling domain-traversal tasks.
The design of my architecture has allowed studies in several application areas that were not previously possible. Some of these include large-scale satellite data and ocean flow analysis. The major driving example is of internal-model variability assessments of flow behavior in the GEOS-5 atmospheric modeling dataset. This application issued over 40 million particle traces for model comparison (the largest parallel flow tracing experiment to date), and my system was able to scale execution up to 65,536 processes on an IBM BlueGene/P system.

Identiferoai:union.ndltd.org:UTENN/oai:trace.tennessee.edu:utk_graddiss-2366
Date01 December 2011
CreatorsKendall, Wesley James
PublisherTrace: Tennessee Research and Creative Exchange
Source SetsUniversity of Tennessee Libraries
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDoctoral Dissertations

Page generated in 0.0021 seconds