Return to search

Virtualized resource management in high performance fabric clusters

Providing performance and isolation guarantees for applications running in virtualized
datacenter environments requires continuous management of the underlying physical
resources. For communication- and I/O-intensive applications running on such platforms,
the management methods must adequately deal with the shared use of the high-performance
fabrics these applications require. In particular, new classes of latency-sensitive and
data-intensive workloads running in virtualized environments rely on emerging fabrics
like 40+Gbps Ethernet and InfiniBand/RoCE with support for RDMA, VMM-bypass and
hardware-level virtualization (SR-IOV). However, the benefits provided by these technology
advances are offset by several management constraints: (i) the inability of the hypervisor
to monitor the VMs’ usage of these fabrics can affect the platform’s ability to provide
isolation and performance guarantees, (ii) the hypervisor cannot provide fine-grained
I/O provisioning or perform management decisions for VMs, thus reducing the degree of
consolidation that can be supported on the platforms, and (iii) without such support it
is harder to integrate these fabrics into emerging cloud computing platforms and
datacenter fabric management solutions. This is made particularly challenging for
workloads spanning multiple VMs, utilizing physical resources distributed across multiple
server nodes and the interconnection fabric.

This thesis addresses the problem of realizing a flexible, dynamic resource management
system for virtualized platforms with high performance fabrics. We make the following key
contributions:

(i) A lightweight monitoring tool, IBMon, integrated with the hypervisor to monitor VMs’
use of RDMA-enabled virtualized interconnects, using memory introspection techniques.

(ii) The design and construction of a resource management system that leverages IBMon
to provide latency-sensitive applications performance guarantees. This system is built
on microeconomic principles of supply and demand and can be deployed on a per-node
(Resource Exchange) or a multi-node (Distributed Resource Exchange) basis. Fine-grained
resource allocations can be enforced through several mechanisms, including CPU capping
or fabric-level congestion control.

(iii) Sphinx, a fabric management solution that leverages Resource Exchange to orchestrate
network and provide latency proportionality for consolidated workloads, based on
user/application-specified policies.

(iv) Implementation and experimental evaluation using InfiniBand clusters virtualized with
the Xen or KVM hypervisor, managed via the OpenFloodlight SDN controller, and using
representative data-intensive and latency-sensitive benchmarks.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/54241
Date07 January 2016
CreatorsRanadive, Adit Uday
ContributorsGavrilovska, Ada
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeDissertation
Formatapplication/pdf

Page generated in 0.0015 seconds