Data analytics used to depend on specialized, high-end software and hardware platforms. Recent years, however, have brought forth the data-flow programming model, i.e., MapReduce, and with it a flurry of sturdy, scalable open-source software solutions for analyzing data. In essence, the commoditization of software frameworks for data analytics is well underway.
Yet, up to this point, data analytics frameworks are still regarded as standalone, em dedicated components; deploying these frameworks requires companies to purchase hardware to meet storage and network resource demands, and system administrators to handle management of data across multiple storage systems.
This dissertation explores the low-cost integration of frameworks for data analytics within existing, shared infrastructures. The thesis centers on smart software being the key enabler for holistic commoditization of data analytics. We focus on two instances of smart software that aid in realizing the low-cost integration objective. For an efficient storage integration, we build MixApart, a scalable data analytics framework that removes the dependency on dedicated storage for analytics; with MixApart, a single, consolidated storage back-end manages data and services all types of workloads, thereby lowering hardware costs and simplifying data management. We evaluate MixApart at scale with micro-benchmarks and production workload traces, and show that MixApart provides faster or comparable performance to an analytics framework with dedicated storage. For an effective sharing of the networking infrastructure, we implement OX, a virtual machine management framework that allows latency-sensitive web applications to share the data center network with data analytics through intelligent VM placement; OX further protects all applications from hardware failures. The two solutions allow the reuse of existing storage and networking infrastructures when deploying analytics frameworks, and substantiate our thesis that smart software upgrades can enable the end-to-end commoditization of analytics.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/35909 |
Date | 09 August 2013 |
Creators | Mihailescu, Madalin |
Contributors | Amza, Cristiana |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0024 seconds