1 |
Performance and Power Optimization of GPU Architectures for General-purpose ComputingWang, Yue 18 June 2014 (has links)
Power-performance efficiency has become a central focus that is challenging in heterogeneous processing platforms as the power constraints have to be established without hindering the high performance. In this dissertation, a framework for optimizing the power and performance of GPUs in the context of general-purpose computing in GPUs (GPGPU) is proposed. To optimize the leakage power of caches in GPUs, we dynamically switch the L1 and L2 caches into low power modes during periods of inactivity to reduce leakage power. The L1 cache can be put into a low-leakage (sleep) state when a processing unit is stalled due to no ready threads to be scheduled and the L2 can be put into sleep state during its idle period when there is no memory request. The sleep mode is state-retentive, which obviates the necessity to flush the caches after they are woken up, thereby, avoiding any performance degradation. Experimental results indicate that this technique can reduce the leakage power by 52% on average. Further, to improve performance, we redistribute the GPGPU workload across the computing units of the GPU during application execution. The fundamental idea is to monitor the workload on each multi-processing unit and redistribute it by having a portion of its unfinished threads executed in a neighboring multi-processing unit. Experimental results show this technique improves the performance of the GPGPU workload by 15.7%. Finally, to improve both performance and dynamic power of GPUs, we propose two dynamic frequency scaling (DFS) techniques implemented on CPU host threads, one of which is motivated by the significance of the pipeline stalls during GPGPU execution. It applies a feedback controlling algorithm, Proportional-Integral-Derivative (PID), to regulate the frequency of parallel processors and memory channels based on the occupancy of the memory buffering queues. The other technique targets on maximizing the average throughput of all parallel processors under the dynamic power constraints. We formalize this target as a linear programming problem and solve it on the runtime. According to the simulation results, the first technique achieves more than 22% power savings with a 4% improvement in performance and the second technique saves 11% power consumption with 9% performance improvement. The contributions of this dissertation represent a significant advancement in the quest for improving performance and reducing energy consumption of GPGPU.
|
2 |
The Development of Hardware Multi-core Test-bed on Field Programmable Gate ArrayShivashanker, Mohan 24 March 2011 (has links)
The goal of this project is to develop a flexible multi-core hardware test-bed on field programmable gate array (FPGA) that can be used to effectively validate the theoretical research on multi-core computing, especially for the power/thermal aware computing. Based on a commercial FPGA test platform, i.e. Xilinx Virtex5 XUPV5 LX110T, we develop a homogeneous multi-core test-bed with four software cores, each of which can dynamically adjust its performance using software. We also enhance the operating system support for this test platform with the development of hardware and software primitives that are useful in dealing with inter-process communication, synchronization, and scheduling for processes on multiple cores. An application based on matrix addition and multiplication on multi-core is implemented to validate the applicability of the test bed.
|
3 |
Etude et sauvegarde de la consommation énergétique dans un environnement simple et multi-processeurs : comprendre combien peut être sauvegardé et comment y arriver sur des systèmes modernes / Energy Characterization and Savings in Single and Multiprocessor Systems : understanding how much can be saved and how to achieve it in modern systemsTriquenaux, Nicolas 18 September 2015 (has links)
Bien que la consommation énergétique des processeurs a considérablement diminué, la demande pour des techniques visant à la réduire n’a jamais été aussi forte. En effet, la consommation énergétique des machines haute performance a crûproportionnellement à leurs accroissements en taille. Elle a atteint un tel niveau qu’elle doit être minimisée par tous les moyens. Les processeurs actuels peuvent changer au vol leurs fréquences d’exécution. Utiliser une fréquence plus faible peut mener à une réduction de leurs consommations énergétiques. Cette thèse recherche jusqu’à quel point cette fonctionnalité, appelé DVFS, peut favoriser cette réduction. Dans un premier temps, une analyse d’une machine simple est effectuée pour une meilleure compréhension des différents éléments consommateurs afin de focaliser les optimisations sur ces derniers. La consommation d’un processeur dépend de l’application qui est exécutée. Une analyse des applications est donc effectuée pour mieux comprendre leurs impacts sur cette dernière. Basés sur cette étude, plusieurs outils visant à réduire cette consommation ont été créés. REST, adapte la fréquence d’exécution au regard du comportement de l’application. Le second, UtoPeak, calcule la réduction maximum que l’on peut attendre grâce au DVFS. Le dernier, FoREST, est créé pour corriger les défauts de REST et obtenir cette réduction maximum de la consommationénergétique. Enfin, les applications scientifiques actuelles utilisent généralement plus d’unprocesseur pour leurs exécutions. Cette thèse présente aussi une première tentative de découverte de la borne inférieure sur la consommation énergétique dans ce nouvel environnement d’exécution / Over the past decade, processors have drastically reduced their power consumption. With each new processor generation, new features enhancing the processor energy efficiency are added. However, the demand for energy reductiontechniques has never been so high. Indeed, with the increasing size of high performance machines, their power and energy consumptions have grown accordingly. They have reached a point where they have to be reduced by all possible means.Current processors allow an interesting feature, they can change their operating frequency at run-time. As granted by transistor physics, lower frequency means lower power consumption and hopefully, lower energy consumption. This thesisinvestigates to which extent this processor feature, called DVFS, can be used to save energy. First, a simple machine is analyzed to have a complete understanding of the different power consumers and where optimizations can be focused. It will be demonstrated that only fans and processors allow run-time energy optimizations. Betweenthe two, the processor shows the highest consumption, therefore potentially exposing the higher potential for energy savings. Second, the power consumption of a processor depends on the applications being executed. However, there are as many applications as problems to solve. The focus is then put on applications to understand their impacts on energy consumption. Based on the gathered insights, multiple tools targeting energy savings on a single processor are created. REST, the most naive, tries to adapt the processor state to the stress generated by the application, hoping for energy reduction. The second, UtoPeak, computes the maximum energy reduction one can expect for any tool usingDVFS. It allows to evaluate the efficiency of such systems. The last one, FoREST, was created in order to correct all the flaws of REST and target maximum energy reduction. Last, scientific applications generally need more than one processor to be executed in a decent time. The thesis also presents a first attempt to compute a lower bound in energy reduction when considering this new execution context
|
Page generated in 0.0784 seconds