• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 1
  • Tagged with
  • 6
  • 6
  • 6
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Mechanisms for coordinated power management with application to cooperative distributed systems

Nathuji, Ripal January 2008 (has links)
Thesis (Ph.D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2008. / Committee Chair: Schwan, Karsten; Committee Co-Chair: Yalamanchili, Sudha; Committee Member: Lee, Hsien-Hsin Sean; Committee Member: Loh, Gabriel; Committee Member: Madisetti, Vijay; Committee Member: Owen, Henry
2

Performance and power modeling of GPU systems with dynamic voltage and frequency scaling

Wang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
3

Performance and power modeling of GPU systems with dynamic voltage and frequency scaling

Wang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
4

Energy conservation techniques for GPU computing

Mei, Xinxin 29 August 2016 (has links)
The emerging general purpose graphics processing units (GPGPU) computing has tremendously speeded up a great variety of commercial and scientific applications. The GPUs have become prevalent accelerators in current high performance clusters. Though the computational capacity per Watt of the GPUs is much higher than that of the CPUs, the hybrid GPU clusters still consume enormous power. To conserve energy on this kind of clusters is of critical significance. In this thesis, we seek energy conservative computing on the GPU accelerated servers. We introduce our studies as follows. First, we dissect the GPU memory hierarchy due to the fact that most of the GPU applications are suffering from the GPU memory bottleneck. We find that the conventional CPU cache models cannot be applied on the modern GPU caches, and the microbenchmarks to study the conventional CPU cache become invalid for the GPU. We propose the GPU-specified microbenchmarks to examine the GPU memory structures and properties. Our benchmark results verify that the design goal of the GPU has transformed from pure computation performance to better energy efficiency. Second, we investigate the impact of dynamic voltage and frequency scaling (DVFS), a successful energy management technique for CPUs, on the GPU platforms. Our experimental results suggest that GPU DVFS is still promising in conserving energy, but the patterns to save energy strongly differ from those of the CPU. Besides, the effect of GPU DVFS depends on the individual application characteristics. Third, we derive the GPU DVFS power and performance models from our experimental results, based on which we find the optimal GPU voltage and frequency setting to minimize the energy consumption of a single GPU task. We then study the problem of scheduling multiple tasks on a hybrid CPU-GPU cluster to minimize the total energy consumption by GPU DVFS. We design an effective offline scheduling algorithm which can reduce the energy consumption significantly. At last, we combine the GPU DVFS and dynamic resource sleep (DRS), another energy management technique, to further conserve the energy, for the online task scheduling on hybrid clusters. Though the idle energy consumption increases significantly compared to the offline problem, our online scheduling algorithm still achieves more than 30% of energy conservation with appropriate runtime GPU DVFS readjustments.
5

Investigation of Immersion Cooled ARM-Based Computer Clusters for Low-Cost, High-Performance Computing

Mohammed, Awaizulla Shareef 08 1900 (has links)
This study aimed to investigate performance of ARM-based computer clusters using two-phase immersion cooling approach, and demonstrate its potential benefits over the air-based natural and forced convection approaches. ARM-based clusters were created using Raspberry Pi model 2 and 3, a commodity-level, single-board computer. Immersion cooling mode utilized two types of dielectric liquids, HFE-7000 and HFE-7100. Experiments involved running benchmarking tests Sysbench high performance linpack (HPL), and the combination of both in order to quantify the key parameters of device junction temperature, frequency, execution time, computing performance, and energy consumption. Results indicated that the device core temperature has direct effects on the computing performance and energy consumption. In the reference, natural convection cooling mode, as the temperature raised, the cluster started to decease its operating frequency to save the internal cores from damage. This resulted in decline of computing performance and increase of execution time, further leading to increase of energy consumption. In more extreme cases, performance of the cluster dropped by 4X, while the energy consumption increased by 220%. This study therefore demonstrated that two-phase immersion cooling method with its near-isothermal, high heat transfer capability would enable fast, energy efficient, and reliable operation, particularly benefiting high performance computing applications where conventional air-based cooling methods would fail.
6

A Compressed Data Collection System For Use In Wireless Sensor Networks

Erratt, Newlyn S. 06 March 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / One of the most common goals of a wireless sensor network is to collect sensor data. The goal of this thesis is to provide an easy to use and energy-e fficient system for deploying data collection sensor networks. There are numerous challenges associated with deploying a wireless sensor network for collection of sensor data; among these challenges are reducing energy consumption and the fact that users interested in collecting data may not be familiar with software design. This thesis presents a complete system, comprised of the Compression Data-stream Protocol and a general gateway for data collection in wireless sensor networks, which attempts to provide an easy to use, energy efficient and complete system for data collection in sensor networks. The Compressed Data-stream Protocol is a transport layer compression protocol with a primary goal, in this work, to reduce energy consumption. Energy consumption of the radio in wireless sensor network nodes is expensive and the Com-pressed Data-stream Protocol has been shown in simulations to reduce energy used on transmission and reception by around 26%. The general gateway has been designed in such a way as to make customization simple without requiring vast knowledge of sensor networks and software development. This, along with the modular nature of the Compressed Data-stream Protocol, enables the creation of an easy to deploy and easy to configure sensor network for data collection. Findings show that individual components work well and that the system as a whole performs without errors. This system, the components of which will eventually be released as open source, provides a platform for researchers purely interested in the data gathered to deploy a sensor network without being restricted to specific vendors of hardware.

Page generated in 0.1857 seconds