Global ETD Search

21	Selective Core Boosting: The Return of the Turbo Button Wamhoff, Jons-Tobias, Diestelhorst, Stephan, Fetzer, Christof, Marlier, Patrick, Felber, Pascal, Dice, Dave 26 November 2013 (has links) Several modern multi-core architectures support the dynamic control of the CPU's clock rate, allowing processor cores to temporarily operate at speeds exceeding the operational base frequency. Conversely, cores can operate at a lower speed or be disabled altogether to save power. Such facilities are notably provided by Intel's Turbo Boost and AMD's Turbo CORE technologies. Frequency control is typically driven by the operating system which requests changes to the performance state of the processor based on the current load of the system. In this paper, we investigate the use of dynamic frequency scaling from user space to speed up multi-threaded applications that must occasionally execute time-critical tasks or to solve problems that have heterogeneous computing requirements. We propose a general-purpose library that allows selective control of the frequency of the cores - subject to the limitations of the target architecture. We analyze the performance trade-offs and illustrate its benefits using several benchmarks and real-world workloads when temporarily boosting selected cores executing time-critical operations. While our study primarily focuses on AMD's architecture, we also provide a comparative evaluation of the features, limitations, and runtime overheads of both Turbo Boost and Turbo CORE technologies. Our results show that we can successful exploit these new hardware facilities to accelerate the execution of key sections of code (critical paths) improving overall performance of some multi-threaded applications. Unlike prior research, we focus on performance instead of power conservation. Our results further can give guidelines for the design of hardware power management facilities and the operating system interfaces to those facilities. info:eu-repo/classification/ddc/004 ddc:004
22	Application-Directed DVFS using Multiple Clock Domains on Graphics Hardware Li, Juan 14 January 2009 (has links) As handheld devices have become increasingly popular, powerful programmable graphics hardware for mobile and handheld devices has been deployed. While many resources on mobile devices are limited, the predominant problem for mobile devices is their limited battery power. Several techniques have been proposed to increase the energy efficiency of mobile applications and improve battery life. In this thesis, we propose a new dynamic voltage and frequency scaling (DVFS) on Graphics Processing Units (GPU). In most cases, cues within the graphics appli- cation can be used to predict portions of a GPU that will be used or unused when the application is run. We partition the GPU into six clock domains that can be clocked at different rates. Specifically, each domain it has its own voltage and frequency set- ting based on its predicted workload to save energy without reducing applications frame rates. In addition, we propose an signature-based algorithm for predicting the workload offered to our six clock domains by a given application to decide voltage and frequency settings. We conduct experiments and compare the results of our new signature based workload prediction algorithm with some other traditional interval based workload prediction algorithms. Our results show that our signature-based prediction can save 30-50% energy without afecting application frame rates. Energy Graphics Process Unit(GPU) Multiple Clock Domain(MCD) Pocket computers Computer graphics
23	A Generalized Framework for Energy Savings in Real-Time Multiprocessor Systems Zeng, Gang, Yokoyama, Tetsuo, Tomiyama, Hiroyuki, Takada, Hiroaki 11 1900 (has links) No description available. embedded real-time systems energy-aware multiprocessor scheduling dynamic hardware resource configuration dynamic voltage frequency scaling dynamic power management
24	E³ : energy-efficient EDGE architectures Govindan, Madhu Sarava 13 December 2010 (has links) Increasing power dissipation is one of the most serious challenges facing designers in the microprocessor industry. Power dissipation, increasing wire delays, and increasing design complexity have forced industry to embrace multi-core architectures or chip multiprocessors (CMPs). While CMPs mitigate wire delays and design complexity, they do not directly address single-threaded performance. Additionally, programs must be parallelized, either manually or automatically, to fully exploit the performance of CMPs. Researchers have recently proposed an architecture called Explicit Data Graph Execution (EDGE) as an alternative to conventional CMPs. EDGE architectures are designed to be technology-scalable and to provide good single-threaded performance as well as exploit other types of parallelism including data-level and thread-level parallelism. In this dissertation, we examine the energy efficiency of a specific EDGE architecture called TRIPS Instruction Set Architecture (ISA) and two microarchitectures called TRIPS and TFlex that implement the TRIPS ISA. TRIPS microarchitecture is a first-generation design that proves the feasibility of the TRIPS ISA and distributed tiled microarchitectures. The second-generation TFlex microarchitecture addresses key inefficiencies of the TRIPS microarchitecture by matching the resource needs of applications to a composable hardware substrate. First, we perform a thorough power analysis of the TRIPS microarchitecture. We describe how we develop architectural power models for TRIPS. We then improve power-modeling accuracy using hardware power measurements on the TRIPS prototype combined with detailed Register Transfer Level (RTL) power models from the TRIPS design. Using these refined architectural power models and normalized power modeling methodologies, we perform a detailed performance and power comparison of the TRIPS microarchitecture with two different processors: 1) a low-end processor designed for power efficiency (ARM/XScale) and 2) a high-end superscalar processor designed for high performance (a variant of Power4). This detailed power analysis provides key insights into the advantages and disadvantages of the TRIPS ISA and microarchitecture compared to processors on either end of the performance-power spectrum. Our results indicate that the TRIPS microarchitecture achieves 11.7 times better energy efficiency compared to ARM, and approximately 12% better energy efficiency than Power4, in terms of the Energy-Delay-Squared (ED²) metric. Second, we evaluate the energy efficiency of the TFlex microarchitecture in comparison to TRIPS, ARM, and Power4. TFlex belongs to a class of microarchitectures called Composable Lightweight Processors (CLPs). CLPs are distributed microarchitectures designed with simple cores and are highly configurable at runtime to adapt to resource needs of applications. We develop power models for the TFlex microarchitecture based on the validated TRIPS power models. Our quantitative results indicate that by better matching execution resources to the needs of applications, the composable TFlex system can operate in both regimes of low power (similar to ARM) and high performance (similar to Power4). We also show that the composability feature of TFlex achieves a signification improvement (2 times) in the ED² metric compared to TRIPS. Third, using TFlex as our experimental platform, we examine the efficacy of processor composability as a potential performance-power trade-off mechanism. Most modern processors support a form of dynamic voltage and frequency scaling (DVFS) as a performance-power trade-off mechanism. Since the rate of voltage scaling has slowed significantly in recent process technologies, processor designers are in dire need of alternatives to DVFS. In this dissertation, we explore processor composability as an architectural alternative to DVFS. Through experimental results we show that processor composability achieves almost as good performance-power trade-offs as pure frequency scaling (no changes in supply voltages), and a much better performance-power trade-off compared to voltage and frequency scaling (both supply voltage and frequency change). Next, we explore the effects of additional performance-improving techniques for the TFlex system on its energy efficiency. Researchers have proposed a variety of techniques for improving the performance of the TFlex system. These include: (1) block mapping techniques to trade off intra-block concurrency with communication across the operand network; (2) predicate prediction and (3) operand multi-cast/broadcast mechanism. We examine each of these mechanisms in terms of its effect on the energy efficiency of TFlex, and our experimental results demonstrate the effects of operand communication, and speculation on the energy efficiency of TFlex. Finally, this dissertation evaluates a set of fine-grained power management (FGPM) policies for TFlex: instruction criticality and controlled speculation. These policies rely on a temporally and spatially fine-grained dynamic voltage and frequency scaling (DVFS) mechanism for improving power efficiency. The instruction criticality policy seeks to improve power efficiency by mapping critical computation in a program to higher performance-power levels, and by mapping non-critical computation to lower performance-power levels. Controlled speculation policy, on the other hand, maps blocks that are highly likely to be on correct execution path in a program to higher performance levels, and the other blocks to lower performance levels. Our experimental results indicate that idealized instruction criticality and controlled speculation policies improve the operating range and flexibility of the TFlex system. However, when the actual overheads of fine-grained DVFS, especially energy conversion losses of voltage regulator modules (VRMs), are considered the power efficiency advantages of these idealized policies quickly diminish. Our results also indicate that the current conversion efficiencies of on-chip VRMs need to improve to as high as 95% for the realistic policies to be feasible. / text Energy efficiency EDGE architectures Power efficiency Composability DVFS Power management Dynamic voltage and frequency scaling
25	Προσαρμογή συχνότητας και τάσης λειτουργίας για τη βελτιστοποίηση κατανάλωσης ενέργειας επεξεργαστών Σπηλιόπουλος, Βασίλειος 19 April 2010 (has links) Η σύγχρονη αρχιτεκτονική στρέφεται σε λύσεις που έχουν ως στόχο την εξοικονόμηση ενέργειας, χωρίς όμως να επιβαρύνεται σε μεγάλο βαθμό η απόδοση του επεξεργαστή. Ιδιαίτερα οι υπερβαθμωτοί (superscalar) επεξεργαστές που επιτρέπουν εκτέλεση εκτός σειράς (out-of-order execution) διακρίνονται από υψηλή κατανάλωση ενέργειας, εξαιτίας των πολύπλοκων δομών που χρησιμοποιούν για την αύξηση της απόδοσης. Η δυναμική ρύθμιση τάσης – συχνότητας (DVFS) αποτελεί μία ευρέως χρησιμοποιούμενη τεχνική για την επίτευξη εξοικονόμησης ενέργειας. Μειώνοντας τη συχνότητα λειτουργίας ενός κυκλώματος, είναι δυνατόν να μειωθεί και η τάση τροφοδοσίας του κυκλώματος. Με τον τρόπο αυτό ελαττώνεται και η ενέργεια που καταναλώνει το κύκλωμα. Σκοπός της εργασίας είναι η ανάπτυξη ενός μηχανισμού πραγματικού χρόνου που θα ρυθμίζει τη συχνότητα και την τάση λειτουργίας ενός superscalar, out-of-order επεξεργαστή ώστε να επιτυγχάνεται εξοικονόμηση ενέργειας χωρίς μεγάλη μείωση της απόδοσης του επεξεργαστή. Αυτό μπορεί να επιτευχθεί ελαττώνοντας τη συχνότητα και την τάση κατά τις περιόδους που ο επεξεργαστής εκτελεί πολλές λειτουργίες μνήμης. Η εξομοίωση του μηχανισμού μας για μία σειρά από μετροπρογράμματα δείχνει ότι μπορούμε να επιτύχουμε μεγάλη εξοικονόμηση ενέργειας χωρίς σημαντική αύξηση του χρόνου εκτέλεσης των προγραμμάτων. / Modern research in computer architecture focuses on techniques whose purpose is to save energy, without much loss in processor's performance. Especially superscalar processors that allow out of order execution are characterized by high energy consumption, because of the complex structures the use in order to increase performance. Dynamic Voltage - Frequency Scaling (DVFS) is a widely used technique for energy saving. Reducing the frequency of the processor's clock, it is possible to reduce the supply voltage. In this way the consumed energy is also reduced. The purpose of this diploma thesis is to create a real time mechanism that will scale the frequency and the voltage of a superscalar, out of order processor so that the processor saves energy without much loss in processor's performance. This can be made by reducing the frequency and the voltage during the periods that the processor executes many memory functions. The simulation of our mechanism for a variety of benchmarks proved that we can save much energy without much increase in the benchmark's execution time. Επεξεργαστές 621.395 Processors Energy saving Superscalar processors
26	System Level Power and Thermal Management on Embedded Processors January 2012 (has links) abstract: Semiconductor scaling technology has led to a sharp growth in transistor counts. This has resulted in an exponential increase on both power dissipation and heat flux (or power density) in modern microprocessors. These microprocessors are integrated as the major components in many modern embedded devices, which offer richer features and attain higher performance than ever before. Therefore, power and thermal management have become the significant design considerations for modern embedded devices. Dynamic voltage/frequency scaling (DVFS) and dynamic power management (DPM) are two well-known hardware capabilities offered by modern embedded processors. However, the power or thermal aware performance optimization is not fully explored for the mainstream embedded processors with discrete DVFS and DPM capabilities. Many key problems have not been answered yet. What is the maximum performance that an embedded processor can achieve under power or thermal constraint for a periodic application? Does there exist an efficient algorithm for the power or thermal management problems with guaranteed quality bound? These questions are hard to be answered because the discrete settings of DVFS and DPM enhance the complexity of many power and thermal management problems, which are generally NP-hard. The dissertation presents a comprehensive study on these NP-hard power and thermal management problems for embedded processors with discrete DVFS and DPM capabilities. In the domain of power management, the dissertation addresses the power minimization problem for real-time schedules, the energy-constrained make-span minimization problem on homogeneous and heterogeneous chip multiprocessors (CMP) architectures, and the battery aware energy management problem with nonlinear battery discharging model. In the domain of thermal management, the work addresses several thermal-constrained performance maximization problems for periodic embedded applications. All the addressed problems are proved to be NP-hard or strongly NP-hard in the study. Then the work focuses on the design of the off-line optimal or polynomial time approximation algorithms as solutions in the problem design space. Several addressed NP-hard problems are tackled by dynamic programming with optimal solutions and pseudo-polynomial run time complexity. Because the optimal algorithms are not efficient in worst case, the fully polynomial time approximation algorithms are provided as more efficient solutions. Some efficient heuristic algorithms are also presented as solutions to several addressed problems. The comprehensive study answers the key questions in order to fully explore the power and thermal management potentials on embedded processors with discrete DVFS and DPM capabilities. The provided solutions enable the theoretical analysis of the maximum performance for periodic embedded applications under power or thermal constraints. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science Approximation algorithm Dynamic power management Dynamic voltage/frequency scaling Low power design Scheduling Thermal management
27	Projeto de células e circuitos VLSI digitais CMOS para operação em baixa tensão / CMOS digital cells and VLSI circuits design for ultra-low voltage operation Rosa, André Luís Rodeghiero January 2015 (has links) Este trabalho propõe uma estratégia para projeto de circuitos VLSI operando em amplo ajuste de tensão e frequência (VFS), desde o regime em Near-threshold, onde uma tensão de VDD caracteriza-se por permitir o funcionamento do circuito com o mínimo dispêndio de energia por operação (MEP), até tensões nominais, dependendo da carga de trabalho exigida pela aplicação. Nesta dissertação é proposto o dimensionamento de transistores para três bibliotecas de células utilizando MOSFETs com tensões de limiar distintas: Regular-VT (RVT), High-VT (HVT) e Low-VT (LVT). Tais bibliotecas possuem cinco células combinacionais: INV, NAND, NOR, OAI21 e AOI22 em múltiplos strengths. A regra para dimensionamento dos transistores das células lógicas foi adaptada de trabalhos relacionados, e fundamenta-se na equalização dos tempos de subida e descida na saída de cada célula, objetivando à redução dos efeitos de variabilidade em baixas tensões de operação. Dois registradores também foram incluídos na biblioteca RVT e sua caracterização foi realizada considerando os parâmetros de processo CMOS 65 nm typical, fast e slow; nas temperaturas de operação de -40°C, 25°C e 125°C, e para tensões variando de 200 mV até 1,2V, para incluir a região de interesse, próxima ao MEP. Os experimentos foram realizados utilizando dez circuitos VLSI de teste: filtro digital notch, um núcleo compatível com o micro-controlador 8051, quatro circuitos combinacionais e quatro sequenciais do benchmark ISCAS. Em termos de economia de energia, operar no MEP resulta em uma redução média de 54,46% em relação ao regime de sub-limiar e até 99,01% quando comparado com a tensão nominal, para a temperatura de 25°C e processo típico. Em relação ao desempenho, operar em regime de VFS muito amplo propicia frequências máximas que variam de centenas de kHz até a faixa de centenas de MHz a GHz, para as temperaturas de -40°C e 25°C, e de MHz até GHz em 125°C. Os resultados desta dissertação, quando comparados a trabalhos relacionados, demonstraram, em média, redução de energia e ganho de desempenho de 24,1% e 152,68%, respectivamente, considerando os mesmos circuitos de teste, operando no ponto de mínima energia (MEP). / This work proposes a strategy for designing VLSI circuits to operate in a very-wide Voltage-Frequency Scaling (VFS) range , from the supply voltage at which the minimum energy per operation (MEP) is achieved, at the Near-Threshold regime, up to the nominal supply voltage for the processes, if so demanded by applications workload. This master thesis proposes the sizing of transistors for three library cells using MOSFETs with different threshold voltages: Regular-VT (RVT), High-VT (HVT), and Low-VT (LVT). These libraries have five combinational cells: INV, NAND, NOR, OAI21, and AOI22 with multiple strengths. The sizing rule for the transistors of the digital cells was an adapted version from related works and it is directly driven by requiring equal rise and fall times at the output for each cell in order to attenuate variability effects in the low supply voltage regime. Two registers were also included in the RVT library cell. This library cell was characterized for typical, fast, and slow processes conditions of a CMOS 65nm technology; for operation at -40ºC, 25ºC, and 125ºC temperatures, and for supply voltages varying from 200 mV up to 1.2V, to include the region of interest, for VDD near the MEP. Experiments were performed with ten VLSI circuit benchmarks: notch filter, 8051 compatible core, four combinational and four sequential ISCAS benchmark circuits. From the energy savings point of view, to operate in MEP results on average reduction of 54.46% and 99.01% when compared with the sub-threshold and nominal supply voltages, respectively. This analysis was performed for 25⁰C and typical process. When considered the performance, the very-wide VFS regime enables maximum operating frequencies varying from hundreds of kHz up to MHz/GHz at -40ºC and 25ºC, and from MHz up to GHz at 125ºC. This master thesis results, when compared with related works, showed on average an energy reduction and performance gain of 24.1% and 152.68%, respectively, for the same circuit benchmarks operating with VDD at the minimum energy point (MEP). Microeletrônica Cmos Vlsi Digital CMOS Voltage-frequency scaling CMOS energy-efficiency Near-threshold Multi-VT MOSFET transistors
28	Projeto de células e circuitos VLSI digitais CMOS para operação em baixa tensão / CMOS digital cells and VLSI circuits design for ultra-low voltage operation Rosa, André Luís Rodeghiero January 2015 (has links) Este trabalho propõe uma estratégia para projeto de circuitos VLSI operando em amplo ajuste de tensão e frequência (VFS), desde o regime em Near-threshold, onde uma tensão de VDD caracteriza-se por permitir o funcionamento do circuito com o mínimo dispêndio de energia por operação (MEP), até tensões nominais, dependendo da carga de trabalho exigida pela aplicação. Nesta dissertação é proposto o dimensionamento de transistores para três bibliotecas de células utilizando MOSFETs com tensões de limiar distintas: Regular-VT (RVT), High-VT (HVT) e Low-VT (LVT). Tais bibliotecas possuem cinco células combinacionais: INV, NAND, NOR, OAI21 e AOI22 em múltiplos strengths. A regra para dimensionamento dos transistores das células lógicas foi adaptada de trabalhos relacionados, e fundamenta-se na equalização dos tempos de subida e descida na saída de cada célula, objetivando à redução dos efeitos de variabilidade em baixas tensões de operação. Dois registradores também foram incluídos na biblioteca RVT e sua caracterização foi realizada considerando os parâmetros de processo CMOS 65 nm typical, fast e slow; nas temperaturas de operação de -40°C, 25°C e 125°C, e para tensões variando de 200 mV até 1,2V, para incluir a região de interesse, próxima ao MEP. Os experimentos foram realizados utilizando dez circuitos VLSI de teste: filtro digital notch, um núcleo compatível com o micro-controlador 8051, quatro circuitos combinacionais e quatro sequenciais do benchmark ISCAS. Em termos de economia de energia, operar no MEP resulta em uma redução média de 54,46% em relação ao regime de sub-limiar e até 99,01% quando comparado com a tensão nominal, para a temperatura de 25°C e processo típico. Em relação ao desempenho, operar em regime de VFS muito amplo propicia frequências máximas que variam de centenas de kHz até a faixa de centenas de MHz a GHz, para as temperaturas de -40°C e 25°C, e de MHz até GHz em 125°C. Os resultados desta dissertação, quando comparados a trabalhos relacionados, demonstraram, em média, redução de energia e ganho de desempenho de 24,1% e 152,68%, respectivamente, considerando os mesmos circuitos de teste, operando no ponto de mínima energia (MEP). / This work proposes a strategy for designing VLSI circuits to operate in a very-wide Voltage-Frequency Scaling (VFS) range , from the supply voltage at which the minimum energy per operation (MEP) is achieved, at the Near-Threshold regime, up to the nominal supply voltage for the processes, if so demanded by applications workload. This master thesis proposes the sizing of transistors for three library cells using MOSFETs with different threshold voltages: Regular-VT (RVT), High-VT (HVT), and Low-VT (LVT). These libraries have five combinational cells: INV, NAND, NOR, OAI21, and AOI22 with multiple strengths. The sizing rule for the transistors of the digital cells was an adapted version from related works and it is directly driven by requiring equal rise and fall times at the output for each cell in order to attenuate variability effects in the low supply voltage regime. Two registers were also included in the RVT library cell. This library cell was characterized for typical, fast, and slow processes conditions of a CMOS 65nm technology; for operation at -40ºC, 25ºC, and 125ºC temperatures, and for supply voltages varying from 200 mV up to 1.2V, to include the region of interest, for VDD near the MEP. Experiments were performed with ten VLSI circuit benchmarks: notch filter, 8051 compatible core, four combinational and four sequential ISCAS benchmark circuits. From the energy savings point of view, to operate in MEP results on average reduction of 54.46% and 99.01% when compared with the sub-threshold and nominal supply voltages, respectively. This analysis was performed for 25⁰C and typical process. When considered the performance, the very-wide VFS regime enables maximum operating frequencies varying from hundreds of kHz up to MHz/GHz at -40ºC and 25ºC, and from MHz up to GHz at 125ºC. This master thesis results, when compared with related works, showed on average an energy reduction and performance gain of 24.1% and 152.68%, respectively, for the same circuit benchmarks operating with VDD at the minimum energy point (MEP). Microeletrônica Cmos Vlsi Digital CMOS Voltage-frequency scaling CMOS energy-efficiency Near-threshold Multi-VT MOSFET transistors
29	Projeto de células e circuitos VLSI digitais CMOS para operação em baixa tensão / CMOS digital cells and VLSI circuits design for ultra-low voltage operation Rosa, André Luís Rodeghiero January 2015 (has links) Este trabalho propõe uma estratégia para projeto de circuitos VLSI operando em amplo ajuste de tensão e frequência (VFS), desde o regime em Near-threshold, onde uma tensão de VDD caracteriza-se por permitir o funcionamento do circuito com o mínimo dispêndio de energia por operação (MEP), até tensões nominais, dependendo da carga de trabalho exigida pela aplicação. Nesta dissertação é proposto o dimensionamento de transistores para três bibliotecas de células utilizando MOSFETs com tensões de limiar distintas: Regular-VT (RVT), High-VT (HVT) e Low-VT (LVT). Tais bibliotecas possuem cinco células combinacionais: INV, NAND, NOR, OAI21 e AOI22 em múltiplos strengths. A regra para dimensionamento dos transistores das células lógicas foi adaptada de trabalhos relacionados, e fundamenta-se na equalização dos tempos de subida e descida na saída de cada célula, objetivando à redução dos efeitos de variabilidade em baixas tensões de operação. Dois registradores também foram incluídos na biblioteca RVT e sua caracterização foi realizada considerando os parâmetros de processo CMOS 65 nm typical, fast e slow; nas temperaturas de operação de -40°C, 25°C e 125°C, e para tensões variando de 200 mV até 1,2V, para incluir a região de interesse, próxima ao MEP. Os experimentos foram realizados utilizando dez circuitos VLSI de teste: filtro digital notch, um núcleo compatível com o micro-controlador 8051, quatro circuitos combinacionais e quatro sequenciais do benchmark ISCAS. Em termos de economia de energia, operar no MEP resulta em uma redução média de 54,46% em relação ao regime de sub-limiar e até 99,01% quando comparado com a tensão nominal, para a temperatura de 25°C e processo típico. Em relação ao desempenho, operar em regime de VFS muito amplo propicia frequências máximas que variam de centenas de kHz até a faixa de centenas de MHz a GHz, para as temperaturas de -40°C e 25°C, e de MHz até GHz em 125°C. Os resultados desta dissertação, quando comparados a trabalhos relacionados, demonstraram, em média, redução de energia e ganho de desempenho de 24,1% e 152,68%, respectivamente, considerando os mesmos circuitos de teste, operando no ponto de mínima energia (MEP). / This work proposes a strategy for designing VLSI circuits to operate in a very-wide Voltage-Frequency Scaling (VFS) range , from the supply voltage at which the minimum energy per operation (MEP) is achieved, at the Near-Threshold regime, up to the nominal supply voltage for the processes, if so demanded by applications workload. This master thesis proposes the sizing of transistors for three library cells using MOSFETs with different threshold voltages: Regular-VT (RVT), High-VT (HVT), and Low-VT (LVT). These libraries have five combinational cells: INV, NAND, NOR, OAI21, and AOI22 with multiple strengths. The sizing rule for the transistors of the digital cells was an adapted version from related works and it is directly driven by requiring equal rise and fall times at the output for each cell in order to attenuate variability effects in the low supply voltage regime. Two registers were also included in the RVT library cell. This library cell was characterized for typical, fast, and slow processes conditions of a CMOS 65nm technology; for operation at -40ºC, 25ºC, and 125ºC temperatures, and for supply voltages varying from 200 mV up to 1.2V, to include the region of interest, for VDD near the MEP. Experiments were performed with ten VLSI circuit benchmarks: notch filter, 8051 compatible core, four combinational and four sequential ISCAS benchmark circuits. From the energy savings point of view, to operate in MEP results on average reduction of 54.46% and 99.01% when compared with the sub-threshold and nominal supply voltages, respectively. This analysis was performed for 25⁰C and typical process. When considered the performance, the very-wide VFS regime enables maximum operating frequencies varying from hundreds of kHz up to MHz/GHz at -40ºC and 25ºC, and from MHz up to GHz at 125ºC. This master thesis results, when compared with related works, showed on average an energy reduction and performance gain of 24.1% and 152.68%, respectively, for the same circuit benchmarks operating with VDD at the minimum energy point (MEP). Microeletrônica Cmos Vlsi Digital CMOS Voltage-frequency scaling CMOS energy-efficiency Near-threshold Multi-VT MOSFET transistors
30	Exploiting Speculative and Asymmetric Execution on Multicore Architectures Wamhoff, Jons-Tobias 27 March 2015 (has links) (PDF) The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software. In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution. The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources. Mehrkernprozessoren Multithreading Synchronisierung Transaktionaler Speicher Frequenzskalierung multicore multithreading locks speculation transactional memory dynamic voltage and frequency scaling ddc:004 rvk:ST 170

Search results