The remarkable surge of artificial intelligence has placed a new emphasis on energy optimization in high-performance computing systems (HPCs). Energy consumption has become a critical consideration and limiting factor for the utility of machine learning applications, with each iteration consuming significantly more energy than the last. Understanding the distribution of energy consumption across different system components is crucial for enhancing the efficiency of systems that drive machine learning applications. This understanding allows us to identify optimal system configurations for machine learning-dedicated machines and minimize the substantial energy costs associated with training and deploying machine learning models.
In this thesis, we present our findings from measuring power consumption for individual system components, including the CPU, GPU, and disk drive, using the PowerPack framework and on-chip power estimates for the FT and MLPerf Inference benchmarks. The direct power consumption measurements taken by the PowerPack framework offer a level of accuracy, granularity, and synchronization across multiple system components that is not feasible with on-chip power estimates alone. PowerPack provides fine-grain application and component-level profiling without the overhead cost of simulation.
The PowerPack framework consists of software tools, including the NI-DAQmx library, and hardware components, such as the NI cDAQ-9172 CompactDAQ Chassis and three NI Analog Input Modules, designed to take physical direct current (DC) power measurements with component-level granularity. PowerPack's physical power measurements were used to assess the accuracy of two on-chip power estimates: AMDuProf, a power and performance analysis tool specifically designed for AMD processors, and NVIDIA-SMI, a software power analysis tool developed for NVIDIA GPUs. The AC power draw of our system under test was measured using a HOBO Plug Load Data Logger to determine the energy consumption distribution among the CPU, GPU, and disk drive.
Our findings reveal the power consumption characteristics for the seven major functions of the MLPerf Inference benchmark. The major functions that comprise MLPerf Inference are ONNX Runtime model prediction, system check, post-processing ONNX model output, image retrieval, backend initialization, and loading annotations using the COCO API. Defining the power consumption characteristics of each function with component-level granularity provides valuable insight for function execution scheduling and identifying bottlenecks in the machine learning inference phase. / Master of Science / The popularity of services that use machine learning, such as OpenAI's ChatGPT and Dall-E, Amazon Alexa, and content curation on social media, has made minimizing the cost of using machine learning incredibly important for businesses across all industries. Machine learning is a powerful tool that enables businesses to optimize production and improve their customer experience, but it can be tremendously expensive to develop and run these services. Training and running machine learning models consume large amounts of energy, which is a major contributor to their cost.
The work presented in this thesis aims to identify how much energy is consumed by individual computer components while running machine learning applications. Understanding how energy consumption is divided among a computer's components is a crucial step toward reducing the energy required to run these applications, thereby making them more affordable.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/119294 |
Date | 04 June 2024 |
Creators | Smokowski, Cesar James |
Contributors | Computer Science and#38; Applications, Cameron, Kirk W., Ellis, Margaret O.'Neil, Back, Godmar Volker |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0019 seconds