<p> Distributed file systems have enabled storage and parsing of arbitrarily large datasets with linearly scaling to hardware resources, however the latency created for minor queries of large datasets becomes untenable in a production environment. By utilizing data storage on both a distributed file system and a traditional relational database, this product will achieve low latency data service to users while maintaining complete archiving.</p><p> The software stack will be utilizing the Apache Hadoop Distributed File System for distributed storage. Apache Hive will be used for queries of the distributed file system. A MySQL database backend will be used for the traditional database service. A J2EE web application will serve as the user interface. </p><p> Decisions on which data service will provide the requested data with the lowest latency will be determined by evaluating the query.</p>
Identifer | oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10195971 |
Date | 23 December 2016 |
Creators | Williams, Michael |
Publisher | California State University, Long Beach |
Source Sets | ProQuest.com |
Language | English |
Detected Language | English |
Type | thesis |
Page generated in 0.0014 seconds