Global ETD Search

Return to search

Distributed Monitoring and Resource Management for Large Cloud Environments

Over the last decade, the number, size and complexity of large-scale networked systems has been growing fast, and this trend is expected to accelerate. The best known example of a large-scale networked system is probably the Internet, while large datacenters for cloud services are the most recent ones. In such environments, a key challenge is to develop scalable and adaptive technologies for management functions. This thesis addresses the challenge by engineering several protocols for distributed monitoring and resource management that are suitable for large-scale networked systems. First, we present G-GAP, a gossip-based protocol we developed for continuous monitoring of aggregates that are computed from device variables. We prove the robustness of this protocol to node failures and validate, through simulations, that its estimation accuracy does not change with increasing size of the monitored system under certain conditions. Second, we present TCA-GAP, a tree-based protocol, and TG-GAP, a gossip-based protocol for the purpose of monitoring threshold crossings of aggregates. For both protocols, we prove correctness properties and show, again through simulations, that both protocols are efficient, by showing that their overhead is at least two orders of magnitude smaller than that of a na\"ive approach, for cases where the monitored aggregate is sufficiently far from the threshold. Third, we present a gossip-based protocol for resource management in cloud environments. The protocol allocates CPU and memory resources to sites that are hosted by the cloud. We prove that the resource allocation computed by the protocol converges exponentially fast to an optimal allocation, for cases where sufficient memory is available. Through simulations, we show that the quality of the resource allocation approaches that of an ideal system when the total memory demand decreases significantly below the memory capacity of the entire system. In addition, we validate that the quality of the allocation does not change with increasing the number of hosted sites and machines, for the case where both metrics are scaled proportionally. Finally, we compare two approaches (tree-based and gossip-based) to engineering protocols for distributed management, for the case of real-time monitoring. Results of our simulation studies indicate that, regardless of the system size and failure rates in the monitored system, gossip protocols incur a significantly larger overhead than tree-based protocols for achieving the same monitoring quality (e.g., estimation accuracy or detection delay). / QC 20101124

decentralized management

engineering protocols

distributed monitoring

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-26207
Date	January 2010
Creators	Wuhib, Fetahi Zebenigus
Publisher	KTH, Kommunikationsnät, Stockholm : KTH
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Doctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Trita-EE, 1653-5146 ; 2010:051

Page generated in 0.0021 seconds

Distributed Monitoring and Resource Management for Large Cloud Environments

Description

Links & Downloads

Tags

Additional Fields