Global ETD Search

1	Modeling the power consumption of computing systems and applications through machine learning techniques / Modélisation de la consommation énergétique des systèmes informatiques et ses applications grâce à des techniques d'apprentissage automatique Fontoura Cupertino, Leandro 17 July 2015 (has links) Au cours des dernières années, le nombre de systèmes informatiques n'a pas cesser d'augmenter. Les centres de données sont peu à peu devenus des équipements hautement demandés et font partie des plus consommateurs en énergie. L'utilisation des centres de données se partage entre le calcul intensif et les services web, aussi appelés informatique en nuage. La rapidité de calcul est primordiale pour le calcul intensif, mais pour les autres services ce paramètre peut varier selon les accords signés sur la qualité de service. Certains centres de données sont dits hybrides car ils combinent plusieurs types de services. Toutes ces infrastructures sont extrêmement énergivores. Dans ce présent manuscrit nous étudions les modèles de consommation énergétiques des systèmes informatiques. De tels modèles permettent une meilleure compréhension des serveurs informatiques et de leur façon de consommer l'énergie. Ils représentent donc un premier pas vers une meilleure gestion de ces systèmes, que ce soit pour faire des économies d'énergie ou pour facturer l'électricité à la charge des utilisateurs finaux. Les politiques de gestion et de contrôle de l'énergie comportent de nombreuses limites. En effet, la plupart des algorithmes d'ordonnancement sensibles à l'énergie utilisent des modèles de consommation restreints qui renferment un certain nombre de problèmes ouverts. De précédents travaux dans le domaine suggèrent d'utiliser les informations de contrôle fournies par le système informatique lui-même pour surveiller la consommation énergétique des applications. Néanmoins, ces modèles sont soit trop dépendants du type d'application, soit manquent de précision. Ce manuscrit présente des techniques permettant d'améliorer la précision des modèles de puissance en abordant des problèmes à plusieurs niveaux: depuis l'acquisition des mesures de puissance jusqu'à la définition d'une charge de travail générique permettant de créer un modèle lui aussi générique, c'est-à-dire qui pourra être utilisé pour des charges de travail hétérogènes. Pour atteindre un tel but, nous proposons d'utiliser des techniques d'apprentissage automatique.Les modèles d'apprentissage automatique sont facilement adaptables à l'architecture et sont le cœur de cette recherche. Ces travaux évaluent l'utilisation des réseaux de neurones artificiels et la régression linéaire comme technique d'apprentissage automatique pour faire de la modélisation statistique non linéaire. De tels modèles sont créés par une approche orientée données afin de pouvoir adapter les paramètres en fonction des informations collectées pendant l'exécution de charges de travail synthétiques. L'utilisation des techniques d'apprentissage automatique a pour but d'atteindre des estimateurs de très haute précision à la fois au niveau application et au niveau système. La méthodologie proposée est indépendante de l'architecture cible et peut facilement être reproductible quel que soit l'environnement. Les résultats montrent que l'utilisation de réseaux de neurones artificiels permet de créer des estimations très précises. Cependant, en raison de contraintes de modélisation, cette technique n'est pas applicable au niveau processus. Pour ce dernier, des modèles prédéfinis doivent être calibrés afin d'atteindre de bons résultats. / The number of computing systems is continuously increasing during the last years. The popularity of data centers turned them into one of the most power demanding facilities. The use of data centers is divided into high performance computing (HPC) and Internet services, or Clouds. Computing speed is crucial in HPC environments, while on Cloud systems it may vary according to their service-level agreements. Some data centers even propose hybrid environments, all of them are energy hungry. The present work is a study on power models for computing systems. These models allow a better understanding of the energy consumption of computers, and can be used as a first step towards better monitoring and management policies of such systems either to enhance their energy savings, or to account the energy to charge end-users. Energy management and control policies are subject to many limitations. Most energy-aware scheduling algorithms use restricted power models which have a number of open problems. Previous works in power modeling of computing systems proposed the use of system information to monitor the power consumption of applications. However, these models are either too specific for a given kind of application, or they lack of accuracy. This report presents techniques to enhance the accuracy of power models by tackling the issues since the measurements acquisition until the definition of a generic workload to enable the creation of a generic model, i.e. a model that can be used for heterogeneous workloads. To achieve such models, the use of machine learning techniques is proposed. Machine learning models are architecture adaptive and are used as the core of this research. More specifically, this work evaluates the use of artificial neural networks (ANN) and linear regression (LR) as machine learning techniques to perform non-linear statistical modeling.Such models are created through a data-driven approach, enabling adaptation of their parameters based on the information collected while running synthetic workloads. The use of machine learning techniques intends to achieve high accuracy application- and system-level estimators. The proposed methodology is architecture independent and can be easily reproduced in new environments.The results show that the use of artificial neural networks enables the creation of high accurate estimators. However, it cannot be applied at the process-level due to modeling constraints. For such case, predefined models can be calibrated to achieve fair results.% The use of process-level models enables the estimation of virtual machines' power consumption that can be used for Cloud provisioning. Power aware computing Machine learning Statistical modeling Application power models
2	Fast demand response with datacenter loads: a green dimension of big data McClurg, Josiah 01 August 2017 (has links) Demand response is one of the critical technologies necessary for allowing large-scale penetration of intermittent renewable energy sources in the electric grid. Data centers are especially attractive candidates for providing flexible, real-time demand response services to the grid because they are capable of fast power ramp-rates, large dynamic range, and finely-controllable power consumption. This thesis makes a contribution toward implementing load shaping with server clusters through a detailed experimental investigation of three broadly-applicable datacenter workload scenarios. We experimentally demonstrate the eminent feasibility of datacenter demand response with a distributed video transcoding application and a simple distributed power controller. We also show that while some software power capping interfaces performed better than others, all the interfaces we investigated had the high dynamic range and low power variance required to achieve high quality power tracking. Our next investigation presents an empirical performance evaluation of algorithms that replace arithmetic operations with low-level bit operations for power-aware Big Data processing. Specifically, we compare two different data structures in terms of execution time and power efficiency: (a) a baseline design using arrays, and (b) a design using bit-slice indexing (BSI) and distributed BSI arithmetic. Across three different datasets and three popular queries, we show that the bit-slicing queries consistently outperform the array algorithm in both power efficiency and execution time. In the context of datacenter power shaping, this performance optimization enables additional power flexibility -- achieving the same or greater performance than the baseline approach, even under power constraints. The investigation of read-optimized index queries leads up to an experimental investigation of the tradeoffs among power constraint, query freshness, and update aggregation size in a dynamic big data environment. We compare several update strategies, presenting a bitmap update optimization that allows improved performance over both a baseline approach and an existing state-of-the-art update strategy. Performing this investigation in the context of load shaping, we show that read-only range queries can be served without performance impact under power cap, and index updates can be tuned to provide a flexible base load. This thesis concludes with a brief discussion of control implementation and summary of our findings. big data analytics cluster computing demand response load shaping power aware computing Electrical and Computer Engineering
3	Power, Performance and Energy Models and Systems for Emergent Architectures Song, Shuaiwen 10 April 2013 (has links) Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency. A key limitation of past approaches is a lack of power-performance policies allowing users to quantitatively bound the effects of power management on the performance of their applications and systems. Existing controllers and predictors use policies fixed by a knowledgeable user to opportunistically save energy and minimize performance impact. While the qualitative effects are often good and the aggressiveness of a controller can be tuned to try to save more or less energy, the quantitative effects of tuning and setting opportunistic policies on performance and power are unknown. In other words, the controller will save energy and minimize performance loss in many cases but we have little understanding of the quantitative effects of controller tuning. This makes setting power-performance policies a manual trial and error process for domain experts and a black art for practitioners. To improve upon past approaches to high-performance power management, we need to quantitatively understand the effects of power and performance at scale. In this work, I have developed theories and techniques to quantitatively understand the relationship between power and performance for high performance systems at scale. For instance, our system-level, iso-energy-efficiency model analyzes, evaluates and predicts the performance and energy use of data intensive parallel applications on multi-core systems. This model allows users to study the effects of machine and application dependent characteristics on system energy efficiency. Furthermore, this model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I have also developed methodologies which can be extended and applied to model modern heterogeneous architectures such as GPU-based clusters to improve their efficiency at scale. / Ph. D. High performance computing power-aware computing runtime system
4	Scalable and Energy Efficient Execution Methods for Multicore Systems Li, Dong 16 February 2011 (has links) Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D. Performance Modeling and Analysis Multicore Processors Power-Aware Computing Concurrency Throttling High-Performance Computing
5	Improving the Efficiency of Parallel Applications on Multithreaded and Multicore Systems Curtis-Maury, Matthew 15 April 2008 (has links) The scalability of parallel applications executing on multithreaded and multicore multiprocessors is often quite limited due to large degrees of contention over shared resources on these systems. In fact, negative scalability frequently occurs such that a non-negligable performance loss is observed through the use of more processors and cores. In this dissertation, we present a prediction model for identifying efficient operating points of concurrency in multithreaded scientific applications in terms of both performance as a primary objective and power secondarily. We also present a runtime system that uses live analysis of hardware event rates through the prediction model to optimize applications dynamically. We discuss a dynamic, phase-aware performance prediction model (DPAPP), which combines statistical learning techniques, including multivariate linear regression and artificial neural networks, with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. We find that the scalability model achieves accuracy approaching 95%, sufficiently accurate to identify improved concurrency levels and thread placements from within real parallel scientific applications. Using DPAPP, we develop a prediction-driven runtime optimization scheme, called ACTOR, which throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase in an application. ACTOR successfully identifies and exploits program phases where limited scalability results in a performance loss through the use of more processing elements, providing simultaneous reductions in execution time by 5%-18% and power consumption by 0%-11% across a variety of parallel applications and architectures. Further, we extend DPAPP and ACTOR to include support for runtime adaptation of DVFS, allowing for the synergistic exploitation of concurrency throttling and DVFS from within a single, autonomically-acting library, providing improved energy-efficiency compared to either approach in isolation. / Ph. D. power-aware computing high-performance computing performance prediction multicore processors runtime adaptation concurrency throttling
6	Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores Shah, Ankur Savailal 28 May 2008 (has links) Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science concurrency throttling power-aware computing runtime adaptation performance prediction high-performance computing Multicore processors
7	Power Saving Analysis and Experiments for Large Scale Global Optimization Cao, Zhenwei 03 August 2009 (has links) Green computing, an emerging field of research that seeks to reduce excess power consumption in high performance computing (HPC), is gaining popularity among researchers. Research in this field often relies on simulation or only uses a small cluster, typically 8 or 16 nodes, because of the lack of hardware support. In contrast, System G at Virginia Tech is a 2592 processor supercomputer equipped with power aware components suitable for large scale green computing research. DIRECT is a deterministic global optimization algorithm, implemented in the mathematical software package VTDIRECT95. This thesis explores the potential energy savings for the parallel implementation of DIRECT, called pVTdirect, when used with a large scale computational biology application, parameter estimation for a budding yeast cell cycle model, on System G. Two power aware approaches for pVTdirect are developed and compared against the CPUSPEED power saving system tool. The results show that knowledge of the parallel workload of the underlying application is beneficial for power management. / Master of Science VTDIRECT95 power aware computing high performance computing DVFS large scale global optimization budding yeast problem
8	Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory Systems Su, Chun-Yi 03 February 2015 (has links) By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy. / Ph. D. Thread Management Multi-core Processors Performance Modeling and Analysis Power-Aware Computing Heterogeneous Memory Data Management
9	Topics in Power and Performance Optimization of Embedded Systems January 2011 (has links) abstract: The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments. In particular, limited energy resources constrain performance in embedded systems often reliant on independent fuel or battery supplies. Ultimately, mitigating energy consumption without sacrificing performance in these systems is paramount. In this work power/performance optimization emphasizing prevailing data centric applications including video and signal processing is addressed for energy constrained embedded systems. Frameworks are presented which exchange quality of service (QoS) for reduced power consumption enabling power aware energy management. Power aware systems provide users with tools for precisely managing available energy resources in light of user priorities, extending availability when QoS can be sacrificed. Specifically, power aware management tools for next generation bistable electrophoretic displays and the state of the art H.264 video codec are introduced. The multiprocessor system on chip (MPSoC) paradigm is examined in the context of next generation many-core hand-held computing devices. MPSoC architectures promise to breach the power/performance wall prohibiting advancement of complex high performance single core architectures. Several many-core distributed memory MPSoC architectures are commercially available, while the tools necessary to effectively tap their enormous potential remain largely open for discovery. Adaptable scalability in many-core systems is addressed through a scalable high performance multicore H.264 video decoder implemented on the representative Cell Broadband Engine (CBE) architecture. The resulting agile performance scalable system enables efficient adaptive power optimization via decoding-rate driven sleep and voltage/frequency state management. The significant problem of mapping applications onto these architectures is additionally addressed from the perspective of instruction mapping for limited distributed memory architectures with a code overlay generator implemented on the CBE. Finally runtime scheduling and mapping of scalable applications in multitasking environments is addressed through the introduction of a lightweight work partitioning framework targeting streaming applications with low latency and near optimal throughput demonstrated on the CBE. / Dissertation/Thesis / Ph.D. Computer Science 2011 Computer Science Computer Engineering Embedded Systems Low Power Multicore Power Management Multicore Scalability Multicore Scheduling Power Aware Computing
10	I/O Aware Power Shifting Savoie, Lee, Lowenthal, David K., Supinski, Bronis R. de, Islam, Tanzima, Mohror, Kathryn, Rountree, Barry, Schulz, Martin 05 1900 (has links) Power limits on future high-performance computing (HPC) systems will constrain applications. However, HPC applications do not consume constant power over their lifetimes. Thus, applications assigned a fixed power bound may be forced to slow down during high-power computation phases, but may not consume their full power allocation during low-power I/O phases. This paper explores algorithms that leverage application semantics-phase frequency, duration and power needs-to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance. We design novel techniques that include explicit staggering of applications to improve power shifting. Compared to executing without power shifting, our algorithms can improve average performance by up to 8% or improve performance of a single, high-priority application by up to 32%. parallel processing power aware computing I/O aware power shifting fixed power bound high-performance computing phase frequency Algorithm design and analysis Clustering algorithms Data visualization Delays Heuristic algorithms Schedules HPC Performance Power

Search results