Global ETD Search

1	Energy-Efficient Databases Using Sweet Spot Frequencies Lehner, Wolfgang, Götz, Sebastian, Ilsche, Thomas, Cardoso, Jorge, Spillner, Josef, Kissinger, Thomas, Aßmann, Uwe, Nagel, Wolfgang E., Schill, Alexander 12 January 2023 (has links) Database management systems (DBMS) are typically tuned for high performance and scalability. Nevertheless, carbon footprint and energy efficiency are also becoming increasing concerns. Unfortunately, existing studies mainly present theoretical contributions but fall short on proposing practical techniques. These could be used by administrators or query optimizers to increase the energy efficiency of the DBMS. Thus, this paper explores the effect of so-called sweet spots, which are energy-efficient CPU frequencies, on the energy required to execute queries. From our findings, we derive the Sweet Spot Technique, which relies on identifying energy-efficient sweet spots and the optimal number of threads that minimizes energy consumption for a query or an entire database workload. The technique is simple and has a practical implementation leading to energy savings of up to 50% compared to using the nominal frequency and maximum number of threads. databases, energy-efficiency, DVFS Datenbanken, Energieeffizienz, DVFS
2	Energy efficient scheduling of parallel real-time tasks on heterogeneous multicore systems / Minimisation de la consommation d'énergie pour des taches temps-réels parallèles sur des architectures multicoeurs hétérogènes Zahaf, Houssam-Eddine 02 November 2016 (has links) Les systèmes cyber-physiques (CPS) et d’Internet des objets génèrent un volume et une variété des données sans précédant. Le temps que ces données parcourent le réseau dans son chemin vers le cloud, la possibilité de réagir à un événement critique pourrait être tardive. Pour résoudre ce problème, les traitements de données nécessitant une réponse rapide sont faits à proximité d’où les données sont collectées. Ainsi, seuls les résultats du pré-traitement sont envoyées au cloud et la réaction pourrai être déclenché suffisamment rapide pour préserver l’intégrité du système. Ce modèle de calcul est connu comme Fog Computing. Un large spectre d’applications de CPS ont des contraintes temporelle et peuvent être facilement parallélisées en distribuant les calculs sur différents sous-ensembles de données en même temps. Ceci peut permettre d’obtenir un temps de réponse plus court et un temps de creux plus large. Ainsi, on peut réduire la fréquence du processeur et/ou éteindre des parties du processeur afin de réduire la consommation d’énergie. Dans cette thèse, nous nous concentrons sur le problème d'ordonnancement d’un ensemble de taches temps-réels parallèles sur des architectures multi-coeurs dans l’objectif de réduire la consommation d’énergie en respectant toutes les contraintes temporelles. Nous proposons ainsi plusieurs modèles de tâches et des testes d'ordonnançabilité pour résoudre le problème d’allocation des threads aux processeurs. Nous proposons aussi des méthodes qui permettent de sélectionner les fréquences et les états des processeurs. Les modèles proposés peuvent être implantés comme des directives dans la même logique que OpenMP. / Cyber physical systems (CPS) and Internet of Objects (IoT) are generating an unprecedented volume and variety of data that needs to be collected and stored on the cloud before being processed. By the time the data makes its way to the cloud for analysis, the opportunity to trigger a reply might be late. One approach to solve this problem is to analyze the most time-sensitive data at the network edge, close to where it is generated. Thus, only the pre-processed results are sent to the cloud. This computation model is know as Fog Computing or Edge computing. Critical CPS applications using the fog computing model may have real-time constraints because results must be delivered in a pre-determined time window. Furthermore, in many relevant applications of CPS, the processing can be parallelized by applying the same processing on different sub-sets of data at the same time by the mean parallel programming techniques. This allow to achieve a shorter response time, and then, a larger slack time, which can be used to reduce energy consumption. In this thesis we focus on the problem of scheduling a set of parallel tasks on multicore processors, with the goal of reducing the energy consumption while all deadlines are met. We propose several realistic task models on architectures with identical and heterogeneous cores, and we develop algorithms for allocating threads to processors, select the core frequencies, and perform schedulability analysis. The proposed task models can be realized by using OpenMP-like APIs. DVFS- DPM 004.33
3	VLSI Implementation of Low Power Reconfigurable MIMO Detector Dash, Rajballav 14 March 2013 (has links) Multiple Input Multiple Output (MIMO) systems are a key technology for next generation high speed wireless communication standards like 802.11n, WiMax etc. MIMO enables spatial multiplexing to increase channel bandwidth which requires the use of multiple antennas in the receiver and transmitter side. The increase in bandwidth comes at the cost of high silicon complexity of MIMO detectors which result, due to the intricate algorithms required for the separation of these spatially multiplexed streams. Previous implementations of MIMO detector have mainly dealt with the issue of complexity reduction, latency minimization and throughput enhancement. Although, these detectors have successfully mapped algorithms to relatively simpler circuits but still, latency and throughput of these systems need further improvements to meet standard requirements. Additionally, most of these implementations don’t deal with the requirements of reconfigurability of the detector to multiple modulation schemes and different antennae configurations. This necessary requirement provides another dimension to the implementation of MIMO detector and adds to the implementation complexity. This thesis focuses on the efficient VLSI implementation of the MIMO detector with an emphasis on performance and re-configurability to different modulation schemes. MIMO decoding in our detector is based on the fixed sphere decoding algorithm which has been simplified for an effective VLSI implementation without considerably degrading the near optimal bit error rate performance. The regularity of the architecture makes it suitable for a highly parallel and pipelined implementation. The decoder has intrinsic traits for dynamic re-configurability to different modulation and encoding schemes. This detector architecture can be easily tuned for high/low performance requirements with slight degradation/improvement in Bit Error Rate (BER) depending on needs of the overlying application. Additionally, various architectural optimizations like pipelining, parallel processing, hardware scheduling, dynamic voltage and frequency scaling have been explored to improve the performance, energy requirements and re-configurability of the design. VLSI MIMO IEEE 802.11n Sphere Decoding DVFS
4	Energy Efficient Scheduling for Real-Time Systems Gupta, Nikhil 2011 December 1900 (has links) The goal of this dissertation is to extend the state of the art in real-time scheduling algorithms to achieve energy efficiency. Currently, Pfair scheduling is one of the few scheduling frameworks which can optimally schedule a periodic real-time taskset on a multiprocessor platform. Despite the theoretical optimality, there exist large concerns about efficiency and applicability of Pfair scheduling in practical situations. This dissertation studies and proposes solutions to such efficiency and applicability concerns. This dissertation also explores temperature aware energy management in the domain of real-time scheduling. The thesis of this dissertation is: the implementation efficiency of Pfair scheduling algorithms can be improved. Further, temperature awareness of a real-time system can be improved while considering variation of task execution times to reduce energy consumption. This thesis is established through research in a number of directions. First, we explore the applicability of Dynamic Voltage and Frequency Scaling (DVFS) feature in the underlying platform, within Pfair scheduled systems. We propose techniques to reduce energy consumption in Pfair scheduling by using DVFS. Next, we explore the problem of quantum size selection in Pfair scheduled system so that runtime overheads are minimized. We also propose a hardware design for a central Pfair scheduler core in a multiprocessor system to minimized the overheads and energy consumption of Pfair scheduling. Finally, we propose a temperature aware energy management scheme for tasks with varying execution times. Multiprocessor Real-Time Scheduling Pfair DVFS
5	Modélisation, prédiction et optimisation de la consommation énergétique d'applications MPI à l'aide de SimGrid / Modeling, Prediction and Optimization of Energy Consumption of MPI Applications using SimGrid Heinrich, Franz 21 May 2019 (has links) Les changements technologiques dans la communauté du calcul hauteperformance (HPC) sont importants, en particulier dans le secteurdu parallélisme massif avec plusieurs milliers de cœurs de calcul sur unGPU unique ou accélérateur, et aussi des nouveaux réseaux complexes.La consommation d’énergie de ces machines continuera de croître dans les années à venir,faisant de l’énergie l’un des principaux facteurs de coût.Cela explique pourquoi même la métrique classique"flop / s", généralement utilisé pour évaluer les applications HPC etles machines, est progressivement remplacé par une métrique centré surl’énergie en "flop / watt".Une approche pour prédire la consommation d'énergie se fait parsimulation, cependant, une prédiction précise de la performance estcruciale pour estimer l’énergie. Dans cette thèse, nouscontribuons à la prédiction de performance et d'énergie des architectures HPC.Nous proposons un modèle énergétique qui a été implémenté dans unsimulateur open source, sg. Nous validons ce modèle avec soin eten le comparant systématiquement avec des expériences réelles.Nous utilisons cette contribution pour évaluer les projetsexistants et nous proposons de nouveaux governors DVFS spécialementconçus pour le contexte HPC. / The High-Performance Computing (HPC) community is currently undergoingdisruptive technology changes in almost all fields, including a switch towardsmassive parallelism with several thousand compute cores on a single GPU oraccelerator and new, complex networks. Powering a massively parallel machinebecomesThe energy consumption of these machines will continue to grow in the future,making energy one of the principal cost factors of machine ownership. This explainswhy even the classic metric "flop/s", generally used to evaluate HPC applicationsand machines, is widely regarded as to be replaced by an energy-centric metric"flop/watt".One approach to predict energy consumption is through simulation, however, a pre-cise performance prediction is crucial to estimate the energy faithfully. In this thesis,we contribute to the performance and energy prediction of HPC architectures. Wepropose an energy model which we have implemented in the open source SimGridsimulator. We validate this model by carefully and systematically comparing itwith real experiments. We leverage this contribution to both evaluate existingand propose new DVFS governors that are part*icularly designed to suit the HPCcontext. Performance Dvfs Hpc Prédiction Energie Simulation Performance Dvfs Hpc Energy Simulation Prédiction 004
6	Efficient Execution Paradigms for Parallel Heterogeneous Architectures Koukos, Konstantinos January 2016 (has links) This thesis proposes novel, efficient execution-paradigms for parallel heterogeneous architectures. The end of Dennard scaling is threatening the effectiveness of DVFS in future nodes; therefore, new execution paradigms are required to exploit the non-linear relationship between performance and energy efficiency of memory-bound application-regions. To attack this problem, we propose the decoupled access-execute (DAE) paradigm. DAE transforms regions of interest (at program-level) in two coarse-grain phases: the access-phase and the execute-phase, which we can independently DVFS. The access-phase is intended to prefetch the data in the cache, and is therefore expected to be predominantly memory-bound, while the execute-phase runs immediately after the access-phase (that has warmed-up the cache) and is therefore expected to be compute-bound. DAE, achieves good energy savings (on average 25% lower EDP) without performance degradation, as opposed to other DVFS techniques. Furthermore, DAE increases the memory level parallelism (MLP) of memory-bound regions, which results in performance improvements of memory-bound applications. To automatically transform application-regions to DAE, we propose compiler techniques to automatically generate and incorporate the access-phase(s) in the application. Our work targets affine, non-affine, and even complex, general-purpose codes. Furthermore, we explore the benefits of software multi-versioning to optimize DAE in dynamic environments, and handle codes with statically unknown access-phase overheads. In general, applications automatically-transformed to DAE by our compiler, maintain (or even exceed in some cases) the good performance and energy efficiency of manually-optimized DAE codes. Finally, to ease the programming environment of heterogeneous systems (with integrated GPUs), we propose a novel system-architecture that provides unified virtual memory with low overhead. The underlying insight behind our work is that existing data-parallel programming models are a good fit for relaxed memory consistency models (e.g., the heterogeneous race-free model). This allows us to simplify the coherency protocol between the CPU – GPU, as well as the GPU memory management unit. On average, we achieve 45% speedup and 45% lower EDP over the corresponding SC implementation. Decoupled Execution Performance Energy DVFS Compiler Optimizations Heterogeneous Coherence
7	Improving Energy-Efficiency of Multicores using First-Order Modeling Spiliopoulos, Vasileios January 2016 (has links) In the recent decades, power consumption has evolved to one of the most critical resources in a computer system. In the form of electricity bill in data centers, battery life in mobile devices, or thermal constraints in desktops and laptops, power consumption imposes several limitations in today’s processors and improving power and energy efficiency is one of the most urgent research topics of Computer Architecture. Dynamic Voltage and Frequency Scaling (DVFS) and Cache Resizing are among the most popular energy saving techniques. Previous work, however, has focused on developing heuristics and trial-and-error methods that yield acceptable savings, but fail to provide insight and understanding of how these techniques affect power and performance of a computer system. In contrast, this Thesis proposes the use of first-order modeling to improve the energy efficiency of computer systems. A first-order model needs to be (i) accurate enough to efficiently drive DVFS and Cache Resizing decisions, and (ii) simple enough to eliminate the overhead of collecting the required inputs to the model. We show that such models can be constructed and successfully applied in modern systems. For DVFS, we propose to scale frequency down to exploit applications’ memory slack, i.e., periods that the processor spends waiting for data to be fetched from the main memory. In such cases, the processor frequency can be scaled down to save energy without inordinate performance penalty. Our DVFS models can detect slack and predict the impact of DVFS in both power and performance with great accuracy. Cache Resizing, on the other hand, relies on the fact that many applications do not benefit from the vast amount of cache that modern processors are equipped with. In such cases, the cache can be resized to save static energy consumption at limited performance cost. Since both techniques are related with the memory behavior of applications, we propose a unified model to manage the two techniques in tandem and maximize energy efficiency through synergistic DVFS and Cache Resizing. Finally, our experience with DVFS in real systems motivated us to contribute to the integration of DVFS into the gem5 simulator. Unlike other simulators that ignore the role of OS in DVFS, we extend the gem5 simulator by developing the hardware and software components that allow existing Linux DVFS infrastructure to be seamlessly integrated in the simulator. Computer Architecture DVFS Cache Resizing Interval modeling Power modeling
8	Energy-aware load balancing approaches to improve energy efficiency on HPC systems / Abordagens de balanceamento de carga ciente de energia para melhorar a eficiência energética em sistemas HPC Padoin, Edson Luiz January 2016 (has links) Os atuais sistemas de HPC tem realizado simulações mais complexas possíveis, produzindo benefícios para diversas áreas de pesquisa. Para atender à crescente demanda de processamento dessas simulações, novos equipamentos estão sendo projetados, visando à escala exaflops. Um grande desafio para a construção destes sistemas é a potência que eles vão demandar, onde perspectivas atuais alcançam GigaWatts. Para resolver este problema, esta tese apresenta uma abordagem para aumentar a eficiência energética usando recursos de HPC, objetivando reduzir os efeitos do desequilíbrio de carga e economizar energia. Nós desenvolvemos uma estratégia baseada no consumo de energia, chamada ENERGYLB, que considera características da plataforma, irregularidade e dinamicidade de carga das aplicações para melhorar a eficiência energética. Nossa estratégia leva em conta carga computacional atual e a frequência de clock dos cores, para decidir entre chamar uma estratégia de balanceamento de carga que reduz o desequilíbrio de carga migrando tarefas, ou usar técnicas de DVFS par ajustar as frequências de clock dos cores de acordo com suas cargas computacionais ponderadas. Como as diferentes arquiteturas de processador podem apresentam dois níveis de granularidade de DVFS, DVFS-por-chip ou DVFS-por-core, nós criamos dois diferentes algoritmos para a nossa estratégia. O primeiro, FG-ENERGYLB, permite um controle fino da frequência dos cores em sistemas que possuem algumas dezenas de cores e implementam DVFS-por-core. Por outro lado, CG-ENERGYLB é adequado para plataformas de HPC composto de vários processadores multicore que não permitem tal refinado controle, ou seja, que só executam DVFS-por-chip. Ambas as abordagens exploram desbalanceamentos residuais em aplicações interativas e combinam balanceamento de carga dinâmico com técnicas de DVFS. Assim, eles reduzem a frequência de clock dos cores com menor carga computacional os quais apresentam algum desequilíbrio residual mesmo após as tarefas serem remapeadas. Nós avaliamos a aplicabilidade das nossas abordagens utilizando o ambiente de programação paralela CHARM++ sobre benchmarks e aplicações reais. Resultados experimentais presentaram melhorias no consumo de energia e na demanda potência sobre algoritmos do estado-da-arte. A economia de energia com ENERGYLB usado sozinho foi de até 25% com nosso algoritmo FG-ENERGYLB, e de até 27% com nosso algoritmo CG-ENERGYLB. No entanto, os desequilíbrios residuais ainda estavam presentes após as serem tarefas remapeadas. Neste caso, quando as nossas abordagens foram empregadas em conjunto com outros balanceadores de carga, uma melhoria na economia de energia de até 56% é obtida com FG-ENERGYLB e de até 36% com CG-ENERGYLB. Estas economias foram obtidas através da exploração do desbalanceamento residual em aplicações interativas. Combinando balanceamento de carga dinâmico com DVFS nossa estratégia é capaz de reduzir a demanda de potência média dos sistemas paralelos, reduzir a migração de tarefas entre os recursos disponíveis, e manter o custo de balanceamento de carga baixo. / Current HPC systems have made more complex simulations feasible, yielding benefits to several research areas. To meet the increasing processing demands of these simulations, new equipment is being designed, aiming at the exaflops scale. A major challenge for building these systems is the power that they will require, which current perspectives reach the GigaWatts. To address this problem, this thesis presents an approach to increase the energy efficiency using of HPC resources, aiming to reduce the effects of load imbalance to save energy. We developed an energy-aware strategy, called ENERGYLB, which considers platform characteristics, and the load irregularity and dynamicity of the applications to improve the energy efficiency. Our strategy takes into account the current computational load and clock frequency, to decide whether to call a load balancing strategy that reduces load imbalance by migrating tasks, or use Dynamic Voltage and Frequency Scaling (DVFS) technique to adjust the clock frequencies of the cores according to their weighted loads. As different processor architectures can feature two levels of DVFS granularity, per-chip DVFS or per-core DVFS, we created two different algorithms for our strategy. The first one, FG-ENERGYLB, allows a fine control of the clock frequency of cores in systems that have few tens of cores and feature per-core DVFS control. On the other hand, CGENERGYLB is suitable for HPC platforms composed of several multicore processors that do not allow such a fine-grained control, i.e., that only perform per-chip DVFS. Both approaches exploit residual imbalances on iterative applications and combine dynamic load balancing with DVFS techniques. Thus, they reduce the clock frequency of underloaded computing cores, which experience some residual imbalance even after tasks are remapped. We evaluate the applicability of our approaches using the CHARM++ parallel programming system over benchmarks and real world applications. Experimental results present improvements in energy consumption and power demand over state-of-the-art algorithms. The energy savings with ENERGYLB used alone were up to 25%with our FG-ENERGYLB algorithm, and up to 27%with our CG-ENERGYLB algorithm. Nevertheless, residual imbalances were still present after tasks were remapped. In this case, when our approaches were employed together with these load balancers, an improvement in energy savings of up to 56% is achieved with FG-ENERGYLB and up to 36% with CG-ENERGYLB. These savings were obtained by exploiting residual imbalances on iterative applications. By combining dynamic load balancing with the DVFS technique, our approach is able to reduce the average power demand of parallel systems, reduce the task migration among the available resources, and keep load balancing overheads low. Balanceamento : Carga Processamento paralelo Load balancing DVFS Energy efficiency
9	Integrated temperature sensors in deep sub-micron CMOS technologies Chowdhury, Golam Rasul 03 July 2014 (has links) Integrated temperature sensors play an important role in enhancing the performance of on-chip power and thermal management systems in today's highly-integrated system-on-chip (SoC) platforms, such as microprocessors. Accurate on-chip temperature measurement is essential to maximize the performance and reliability of these SoCs. However, due to non-uniform power consumption by different functional blocks, microprocessors have fairly large thermal gradient (and variation) across their chips. In the case of multi-core microprocessors for example, there are task-specific thermal gradients across different cores on the same die. As a result, multiple temperature sensors are needed to measure the temperature profile at all relevant coordinates of the chip. Subsequently, the results of the temperature measurements are used to take corrective measures to enhance the performance, or save the SoC from catastrophic over-heating situations which can cause permanent damage. Furthermore, in a large multi-core microprocessor, it is also imperative to continuously monitor potential hot-spots that are prone to thermal runaway. The locations of such hot spots depend on the operations and instruction the processor carries out at a given time. Due to practical limitations, it is an overkill to place a big size temperature sensor nearest to all possible hot spots. Thus, an ideal on-chip temperature sensor should have minimal area so that it can be placed non-invasively across the chip without drastically changing the chip floor plan. In addition, the power consumption of the sensors should be very low to reduce the power budget overhead of thermal monitoring system, and to minimize measurement inaccuracies due to self-heating. The objective of this research is to design an ultra-small size and ultra-low power temperature sensor such that it can be placed in the intimate proximity of all possible hot spots across the chip. The general idea is to use the leakage current of a reverse-bias p-n junction diode as an operand for temperature sensing. The tasks within this project are to examine the theoretical aspect of such sensors in both Silicon-On-Insulator (SOI), and bulk Complementary Metal-Oxide Semiconductor (CMOS) technologies, implement them in deep sub-micron technologies, and ultimately evaluate their performances, and compare them to existing solutions. / text CMOS TDC ADC BJT p-n Delta-sigma DVFS SoC
10	Control Techniques for Uncore Power Mangement in Chip Multiprocessor Designs Xu, Zheng 16 December 2013 (has links) In chip-multiprocessor (CMP) designs, when the number of core increases, the size of on-chip communication fabric and data storage grows accordingly and therefore the chip power challenge is exacerbated. This thesis work considers the power management for networks-on-chip (NoC) and the last level cache, which constitute the uncore in CMP designs. NoC is regarded as a scalable approach to cope with the increasing demand for on-chip communication bandwidth. The last level cache is shared among all cores. The focus of this work is on the control techniques for uncore dynamic voltage and frequency scaling. A realistic but not well-studied scenario is investigated. That is, the entire uncore shares a single voltage/frequency domain, as opposed to separated domains in most of previous works. One appealing advantage here is that data packets no longer experience the interfacing overhead across different voltage/frequency domains. The classic PI (Proportional and Integral) control method is adopted due to its simplicity, flexibility and low implementation overhead. This thesis research outcome includes three parts. First, stability of the PI control is analyzed. Second, a model-assisted PI control scheme is proposed and studied. The model assist is to address the problem that no universally good reference point exists for the control. Third, the windup issue for the PI control is investigated. Full architecture simulations are performed on public benchmark suites to validate the proposed techniques. The result show 76% energy reduction with less than 6% performance degradation compared to constantly high voltage/frequency for uncore. DVFS V/F NoC LLC CMP AMAT PID

Search results