Global ETD Search

11	Application-Directed DVFS using Multiple Clock Domains on Graphics Hardware Li, Juan 14 January 2009 (has links) As handheld devices have become increasingly popular, powerful programmable graphics hardware for mobile and handheld devices has been deployed. While many resources on mobile devices are limited, the predominant problem for mobile devices is their limited battery power. Several techniques have been proposed to increase the energy efficiency of mobile applications and improve battery life. In this thesis, we propose a new dynamic voltage and frequency scaling (DVFS) on Graphics Processing Units (GPU). In most cases, cues within the graphics appli- cation can be used to predict portions of a GPU that will be used or unused when the application is run. We partition the GPU into six clock domains that can be clocked at different rates. Specifically, each domain it has its own voltage and frequency set- ting based on its predicted workload to save energy without reducing applications frame rates. In addition, we propose an signature-based algorithm for predicting the workload offered to our six clock domains by a given application to decide voltage and frequency settings. We conduct experiments and compare the results of our new signature based workload prediction algorithm with some other traditional interval based workload prediction algorithms. Our results show that our signature-based prediction can save 30-50% energy without afecting application frame rates. Energy Graphics Process Unit(GPU) Multiple Clock Domain(MCD) Pocket computers Computer graphics
12	E³ : energy-efficient EDGE architectures Govindan, Madhu Sarava 13 December 2010 (has links) Increasing power dissipation is one of the most serious challenges facing designers in the microprocessor industry. Power dissipation, increasing wire delays, and increasing design complexity have forced industry to embrace multi-core architectures or chip multiprocessors (CMPs). While CMPs mitigate wire delays and design complexity, they do not directly address single-threaded performance. Additionally, programs must be parallelized, either manually or automatically, to fully exploit the performance of CMPs. Researchers have recently proposed an architecture called Explicit Data Graph Execution (EDGE) as an alternative to conventional CMPs. EDGE architectures are designed to be technology-scalable and to provide good single-threaded performance as well as exploit other types of parallelism including data-level and thread-level parallelism. In this dissertation, we examine the energy efficiency of a specific EDGE architecture called TRIPS Instruction Set Architecture (ISA) and two microarchitectures called TRIPS and TFlex that implement the TRIPS ISA. TRIPS microarchitecture is a first-generation design that proves the feasibility of the TRIPS ISA and distributed tiled microarchitectures. The second-generation TFlex microarchitecture addresses key inefficiencies of the TRIPS microarchitecture by matching the resource needs of applications to a composable hardware substrate. First, we perform a thorough power analysis of the TRIPS microarchitecture. We describe how we develop architectural power models for TRIPS. We then improve power-modeling accuracy using hardware power measurements on the TRIPS prototype combined with detailed Register Transfer Level (RTL) power models from the TRIPS design. Using these refined architectural power models and normalized power modeling methodologies, we perform a detailed performance and power comparison of the TRIPS microarchitecture with two different processors: 1) a low-end processor designed for power efficiency (ARM/XScale) and 2) a high-end superscalar processor designed for high performance (a variant of Power4). This detailed power analysis provides key insights into the advantages and disadvantages of the TRIPS ISA and microarchitecture compared to processors on either end of the performance-power spectrum. Our results indicate that the TRIPS microarchitecture achieves 11.7 times better energy efficiency compared to ARM, and approximately 12% better energy efficiency than Power4, in terms of the Energy-Delay-Squared (ED²) metric. Second, we evaluate the energy efficiency of the TFlex microarchitecture in comparison to TRIPS, ARM, and Power4. TFlex belongs to a class of microarchitectures called Composable Lightweight Processors (CLPs). CLPs are distributed microarchitectures designed with simple cores and are highly configurable at runtime to adapt to resource needs of applications. We develop power models for the TFlex microarchitecture based on the validated TRIPS power models. Our quantitative results indicate that by better matching execution resources to the needs of applications, the composable TFlex system can operate in both regimes of low power (similar to ARM) and high performance (similar to Power4). We also show that the composability feature of TFlex achieves a signification improvement (2 times) in the ED² metric compared to TRIPS. Third, using TFlex as our experimental platform, we examine the efficacy of processor composability as a potential performance-power trade-off mechanism. Most modern processors support a form of dynamic voltage and frequency scaling (DVFS) as a performance-power trade-off mechanism. Since the rate of voltage scaling has slowed significantly in recent process technologies, processor designers are in dire need of alternatives to DVFS. In this dissertation, we explore processor composability as an architectural alternative to DVFS. Through experimental results we show that processor composability achieves almost as good performance-power trade-offs as pure frequency scaling (no changes in supply voltages), and a much better performance-power trade-off compared to voltage and frequency scaling (both supply voltage and frequency change). Next, we explore the effects of additional performance-improving techniques for the TFlex system on its energy efficiency. Researchers have proposed a variety of techniques for improving the performance of the TFlex system. These include: (1) block mapping techniques to trade off intra-block concurrency with communication across the operand network; (2) predicate prediction and (3) operand multi-cast/broadcast mechanism. We examine each of these mechanisms in terms of its effect on the energy efficiency of TFlex, and our experimental results demonstrate the effects of operand communication, and speculation on the energy efficiency of TFlex. Finally, this dissertation evaluates a set of fine-grained power management (FGPM) policies for TFlex: instruction criticality and controlled speculation. These policies rely on a temporally and spatially fine-grained dynamic voltage and frequency scaling (DVFS) mechanism for improving power efficiency. The instruction criticality policy seeks to improve power efficiency by mapping critical computation in a program to higher performance-power levels, and by mapping non-critical computation to lower performance-power levels. Controlled speculation policy, on the other hand, maps blocks that are highly likely to be on correct execution path in a program to higher performance levels, and the other blocks to lower performance levels. Our experimental results indicate that idealized instruction criticality and controlled speculation policies improve the operating range and flexibility of the TFlex system. However, when the actual overheads of fine-grained DVFS, especially energy conversion losses of voltage regulator modules (VRMs), are considered the power efficiency advantages of these idealized policies quickly diminish. Our results also indicate that the current conversion efficiencies of on-chip VRMs need to improve to as high as 95% for the realistic policies to be feasible. / text Energy efficiency EDGE architectures Power efficiency Composability DVFS Power management Dynamic voltage and frequency scaling
13	Exploiting Speculative and Asymmetric Execution on Multicore Architectures Wamhoff, Jons-Tobias 27 March 2015 (has links) (PDF) The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software. In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution. The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources. Mehrkernprozessoren Multithreading Synchronisierung Transaktionaler Speicher Frequenzskalierung multicore multithreading locks speculation transactional memory dynamic voltage and frequency scaling ddc:004 rvk:ST 170
14	Energy consumption optimization of parallel applications with Iterations using CPU frequency scaling / Optimisation de la consommation énergétique des applications parallèles avec des itérations en utilisant réduisant la fréquence des processeurs Fanfakh, Ahmed Badri Muslim 17 October 2016 (has links) Au cours des dernières années, l'informatique “green” est devenue un sujet important dans le calcul intensif. Cependant, les plates-formes informatiques continuent de consommer de plus en plus d'énergie en raison de l'augmentation du nombre de noeuds qui les composent. Afin de minimiser les coûts d'exploitation de ces plates-formes de nombreuses techniques ont été étudiées, parmi celles-ci, il y a le changement de la fréquence dynamique des processeurs (DVFS en anglais). Il permet de réduire la consommation d'énergie d'un CPU, en abaissant sa fréquence. Cependant, cela augmente le temps d'exécution de l'application. Par conséquent, il faut trouver un seuil qui donne le meilleur compromis entre la consommation d'énergie et la performance d'une application. Cette thèse présente des algorithmes développés pour optimiser la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones et asynchrones sur des clusters ou des grilles. Les modèles de consommation d'énergie et de performance proposés pour chaque type d'application parallèle permettent de prédire le temps d'exécution et la consommation d'énergie d'une application pour toutes les fréquences disponibles.La contribution de cette thèse peut être divisé en trois parties. Tout d'abord, il s'agit d'optimiser le compromis entre la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones sur des clusters homogènes. Deuxièmement, nous avons adapté les modèles de performance énergétique aux plates-formes hétérogènes dans lesquelles chaque noeud peut avoir des spécifications différentes telles que la puissance de calcul, la consommation d'énergie, différentes fréquences de fonctionnement ou encore des latences et des bandes passantes réseaux différentes. L'algorithme d'optimisation de la fréquence CPU a également été modifié en fonction de l'hétérogénéité de la plate-forme. Troisièmement, les modèles et l'algorithme d'optimisation de la fréquence CPU ont été complètement repensés pour prendre en considération les spécificités des algorithmes itératifs asynchrones.Tous ces modèles et algorithmes ont été appliqués sur des applications parallèles utilisant la bibliothèque MPI et ont été exécutés avec le simulateur Simgrid ou sur la plate-forme Grid'5000. Les expériences ont montré que les algorithmes proposés sont plus efficaces que les méthodes existantes. Ils n’introduisent qu’un faible surcoût et ne nécessitent pas de profilage au préalable car ils sont exécutés au cours du déroulement de l’application. / In recent years, green computing has become an important topic in the supercomputing research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase the execution time of the application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected.This thesis, presents the algorithms developed to optimize the energy consumption and theperformance of synchronous and asynchronous message passing applications with iterations runningover clusters or grids. The energy consumption and performance models for each type of parallelapplication predicts its execution time and energy consumption for any selected frequency accordingto the characteristics of both the application and the architecture executing this application.The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-offbetween the energy consumption and the performance of the message passing applications withsynchronous iterations running over homogeneous clusters. Secondly, adapting the energy andperformance models to heterogeneous platforms where each node can have different specificationssuch as computing power, energy consumption, available frequency gears or network’s latency andbandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of theplatform. Thirdly, the models and the frequency scaling algorithm were completely rethought to takeinto considerations the asynchronism in the communication and computation. All these models andalgorithms were applied to message passing applications with iterations and evaluated over eitherSimGrid simulator or Grid’5000 platform. The experiments showed that the proposed algorithms areefficient and outperform existing methods such as the energy and delay product. They also introducea small runtime overhead and work online without any training or profiling. Grille de calcul Optimisation de l'énergie Dynamic voltage and frequency scaling Grid computing Energy optimization Parallel applications with iterations 621.39
15	Energy-aware Scheduling for Multiprocessor Real-time Systems Bhatti, K. 18 April 2011 (has links) (PDF) Les applications temps réel modernes deviennent plus exigeantes en termes de ressources et de débit amenant la conception d'architectures multiprocesseurs. Ces systèmes, des équipements embarqués au calculateur haute performance, sont, pour des raisons d'autonomie et de fiabilité, confrontés des problèmes cruciaux de consommation d'énergie. Pour ces raisons, cette thèse propose de nouvelles techniques d'optimisation de la consommation d'énergie dans l'ordonnancement de systèmes multiprocesseur. La premiére contribution est un algorithme d'ordonnancement hiérarchique á deux niveaux qui autorise la migration restreinte des tâches. Cet algorithme vise á réduire la sous-optimalité de l'algorithme global EDF. La deuxiéme contribution de cette thèse est une technique de gestion dynamique de la consommation nommée Assertive Dynamic Power Management (AsDPM). Cette technique, qui régit le contrôle d'admission des tâches, vise á exploiter de manière optimale les modes repos des processeurs dans le but de réduire le nombre de processeurs actifs. La troisiéme contribution propose une nouvelle technique, nommée Deterministic Stretch-to-Fit (DSF), permettant d'exploiter le DVFS des processeurs. Les gains énergétiques observés s'approchent des solutions déjà existantes tout en offrant une complexité plus réduite. Ces techniques ont une efficacité variable selon les applications, amenant á définir une approche plus générique de gestion de la consommation appelée Hybrid Power Management (HyPowMan). Cette approche sélectionne, en cours d'exécution, la technique qui répond le mieux aux exigences énergie/performance. [INFO] Computer Science [SPI] Engineering Sciences Systèmes Embarqués Systèmes Temps réel Multiprocesseur Dynamic Power Management (DPM) Ordonnancement Multiprocesseur 'Consommation de puissance MPSoC
16	Power efficient and power attacks resistant system design and analysis using aggressive scaling with timing speculation Rathnala, Prasanthi January 2017 (has links) Growing usage of smart and portable electronic devices demands embedded system designers to provide solutions with better performance and reduced power consumption. Due to the new development of IoT and embedded systems usage, not only power and performance of these devices but also security of them is becoming an important design constraint. In this work, a novel aggressive scaling based on timing speculation is proposed to overcome the drawbacks of traditional DVFS and provide security from power analysis attacks at the same time. Dynamic voltage and frequency scaling (DVFS) is proven to be the most suitable technique for power efficiency in processor designs. Due to its promising benefits, the technique is still getting researchers attention to trade off power and performance of modern processor designs. The issues of traditional DVFS are: 1) Due to its pre-calculated operating points, the system is not able to suit to modern process variations. 2) Since Process Voltage and Temperature (PVT) variations are not considered, large timing margins are added to guarantee a safe operation in the presence of variations. The research work presented here addresses these issues by employing aggressive scaling mechanisms to achieve more power savings with increased performance. This approach uses in-situ timing error monitoring and recovering mechanisms to reduce extra timing margins and to account for process variations. A novel timing error detection and correction mechanism, to achieve more power savings or high performance, is presented. This novel technique has also been shown to improve security of processors against differential power analysis attacks technique. Differential power analysis attacks can extract secret information from embedded systems without knowing much details about the internal architecture of the device. Simulated and experimental data show that the novel technique can provide a performance improvement of 24% or power savings of 44% while occupying less area and power overhead. Overall, the proposed aggressive scaling technique provides an improvement in power consumption and performance while increasing the security of processors from power analysis attacks. 004.67
17	Exploiting Speculative and Asymmetric Execution on Multicore Architectures Wamhoff, Jons-Tobias 21 November 2014 (has links) The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software. In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution. The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources. info:eu-repo/classification/ddc/004 ddc:004
18	Performance Modeling and On-Chip Memory Structures for Minimum Energy Operation in Voltage-Scaled LSI Circuits / 低電圧集積回路の消費エネルギー最小化のための解析的性能予測とオンチップメモリ構造 Shiomi, Jun 24 November 2017 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20778号 / 情博第658号 / 新制\|\|情報\|\|113(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授小野寺秀俊, 教授佐藤高史, 教授黒橋禎夫 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Energy-efficient LSI Low-voltage operation Variability-aware timing modeling Standard-Cell Memory (SCM) Adaptive Body Biasing (ABB) Minimum energy point operation 007
19	Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning Clark, Mark A. 05 June 2019 (has links) No description available. Computer Engineering Electrical Engineering Network-on-Chip Machine Learning Power-Gating Energy Management for NoC Machine Learning enhanced DVFS Dynamic Voltage and Frequency Scaling
20	Gestion de la consommation basée sur l’adaptation dynamique de la tension, fréquence et body bias sur les systèmes sur puce en technologie FD-SOI / Power Management based on Dynamic Voltage, Frequency and Body Bias Scaling on System On Chip in FD-SOI technology Akgul, Yeter 09 December 2014 (has links) Au-delà du nœud technologique CMOS BULK 28nm, certaines limites ont été atteintes dans l'amélioration des performances en raison notamment d'une consommation énergétique devenant trop importante. C'est une des raisons pour lesquelles de nouvelles technologies ont été développées, notamment celles basées sur Silicium sur Isolant (SOI). Par ailleurs, la généralisation des architectures complexes de type multi-cœurs, accentue le problème de gestion de la consommation à grain-fin. Les technologies CMOS FD-SOI offrent de nouvelles opportunités pour la gestion de la consommation en permettant d'ajuster, outre les paramètres usuels que sont la tension d'alimentation et la fréquence d'horloge, la tension de body bias. C'est dans ce contexte que ce travail étudie les nouvelles possibilités offertes et explore des solutions innovantes de gestion dynamique de la tension d'alimentation, fréquence d'horloge et tension de body bias afin d'optimiser la consommation énergétique des systèmes sur puce. L'ensemble des paramètres tensions/fréquence permettent une multitude de points de fonctionnement, qui doivent satisfaire des contraintes de fonctionnalité et de performance. Ce travail s'intéresse donc dans un premier temps à une problématique de conception, en proposant une méthode d'optimisation du placement de ces points de fonctionnement. Une solution analytique permettant de maximiser le gain en consommation apporté par l'utilisation de plusieurs points de fonctionnement est proposée. La deuxième contribution importante de cette thèse concerne la gestion dynamique de la tension d'alimentation, de la fréquence et de la tension de body bias, permettant d'optimiser l'efficacité énergétique en se basant sur le concept de convexité. La validation expérimentale des méthodes proposées s'appuie sur des échantillons de circuits réels, et montre des gains en consommation moyens allant jusqu'à 35%. / Beyond 28nm CMOS BULK technology node, some limits have been reached in terms of performance improvements. This is mainly due to the increasing power consumption. This is one of the reasons why new technologies have been developed, including those based on Silicon-On-Insulator (SOI). Moreover, the standardization of complex architectures such as multi-core architectures emphasizes the problem of power management at fine-grain. FD-SOI technologies offer new power management opportunities by adjusting, in addition to the usual parameters such as supply voltage and clock frequency, the body bias voltage. In this context, this work explores new opportunities and searches novel solutions for dynamically manage supply voltage, clock frequency and body bias voltage in order to optimize the power consumption of System on Chip.Adjusting supply voltage, frequency and body bias parameters allows multiple operating points, which must satisfy the constraints of functionality and performance. This work focuses initially at design time, proposing a method to optimize the placement of these operating points. An analytical solution to maximize power savings achieved through the use of several operating points is provided. The second important contribution of this work is a method based on convexity concept to dynamically manage the supply voltage, the frequency and the body bias voltage so as to optimize the energy efficiency. The experimental results based on real circuits show average power savings reaching 35%. Dvfs Adaptation dynamique de Body Bias Fd-Soi Gestion de la consommation Optimisation de la consommation Sélection de Power Mode Dynamic Voltage and Frequency Scaling Adaptive Body Biasing Fd-Soi Power Management Power Optimization Power Mode Selection

Search results