Global ETD Search

Return to search

Application-aware resource management for datacenters / Applikationsmedveten resurshantering för datacenter

High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and making predictions. Already in a single datacenter server, there are thousands of hardware and software metrics – Key Performance Indicators (KPIs) – that individually and aggregated can give insight in the performance, robustness, and efficiency of the datacenter and the provisioned applications. At the datacenter level, the number of KPIs is even higher. The fast growing interest on datacenter management from both public and industry together with the rapid expansion in scale and complexity of datacenter resources and the services being provided on them have made monitoring, profiling, controlling, and provisioning compute resources dynamically at runtime into a challenging and complex task. Commonly, correlations of application KPIs, like response time and throughput, with resource capacities show that runtime systems (e.g., containers or virtual machines) that are used to provision these applications do not utilize available resources efficiently. This reduces datacenter efficiency, which in term results in higher operational costs and longer waiting times for results. The goal of this thesis is to develop tools and autonomic techniques for improving datacenter operations, management and utilization, while improving and/or minimizing impacts on applications performance. To this end, we make use of application resource descriptors to create a library that dynamically adjusts the amount of resources used, enabling elasticity for scientific workflows in HPC datacenters. For mission critical applications, high availability is of great concern since these services must be kept running even in the event of system failures. By modeling and correlating specific resource counters, like CPU, memory and network utilization, with the number of runtime synchronizations, we present adaptive mechanisms to dynamically select which fault tolerant mechanism to use. Likewise, for scientific applications we propose a hybrid extensible architecture for dual-level scheduling of data intensive jobs in HPC infrastructures, allowing operational simplification, on-boarding of new types of applications and achieving greater job throughput with higher overall datacenter efficiency.

Resource Management

High Performance Computing

Cloud Computing

Computer Systems

Datorsystem

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-155620
Date	January 2018
Creators	Souza, Abel Pinto Coelho de
Publisher	Umeå universitet, Institutionen för datavetenskap, Umeå : Department of computing science, Umeå university
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Licentiate thesis, comprehensive summary, info:eu-repo/semantics/masterThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Report / UMINF, 0348-0542 ; 18.14

Page generated in 0.0023 seconds

Application-aware resource management for datacenters / Applikationsmedveten resurshantering för datacenter

Description

Links & Downloads

Tags

Additional Fields