Return to search

Performance-aware site-wide data center power management

Top high performance computing (HPC) data centers recently entered the era of exascale computing, requiring up to tens of megawatts for a computing facility to meet its users’ computing needs. The massive capacity for power at a single site comes with challenges in power management. Poorly managed power may result in unnecessarily high demand on costly energy or may cause the system to under-perform. An HPC data center may have many types of users and workloads with non-trivial power requirements, making it difficult to select a one-size-fits-all policy. But the high power capacity also offers opportunities for data centers to be key players enabling greater adoption of renewable energy across a power grid. Data centers can adjust their demand through software power management policies to help smart grids balance against nature’s time-varying green energy supply.

This thesis claims that multi-tiered power management methods are essential for data centers to implement site-wide power management policies that accurately respond to changing power constraints at a higher cluster-level tier while reacting to application-specific performance impacts at a lower job-level tier. Through investigations over site, cluster, job, and server characteristics, we demonstrate that a feedback-driven multi-tiered power management approach meets power management objectives more effectively than siloed solutions. We design a cluster power management policy that distributes power across jobs using knowledge about job power-performance properties, demonstrating up to 7% reduction in system time dedicated to jobs and up to 11% savings in energy, compared to a policy without job awareness. We provide a power management framework that enables accurate, dynamic cluster power control while reacting to incomplete or inaccurate prior knowledge about job power and performance properties. We add a site-wide power model to a cluster power management policy that offers regulation services in a smart grid, showing 1.3x cost savings compared to a policy that is unaware of site-wide power consumption. We introduce a job power management policy that integrates job performance awareness with knowledge of hardware power-performance trade-offs, demonstrating up to 40% energy reduction and 17% execution time reduction in an imbalanced, compute-bound benchmark compared to a policy without frequency throttling. / 2024-02-29T00:00:00Z

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/46654
Date30 August 2023
CreatorsWilson, Daniel C.
ContributorsCoskun, Ayse K.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0028 seconds