Spelling suggestions: "subject:"block gating"" "subject:"clock gating""
1 |
Decomposition of FSMD for Low PowerWu, Ming-Ho 09 September 2008 (has links)
none
|
2 |
Implementation of Variable-Latency Floating-Point Multipliers for Low-Power ApplicationsHong, Hua-yi 29 July 2008 (has links)
Floating-point multipliers are typically power hungry which is undesirable in many embedded applications. This paper proposes a variable-latency floating-point multiplier architecture, which is suitable for low-power, high-performance, and high-accuracy applications. The architecture splits the significand multiplier into upper and lower parts, and predicts the required significand product and sticky bit from upper part. In the case of correct prediction, the computation of lower part is disabled and the rounding operation is significantly simplified so that floating-point multiplication can be completed early.
Finally, detailed design and simulation of the floating-point multiplier is presented, together with its evaluation by comparing power consumption with the fast and conventional floating-point multipliers. Experimental results demonstrate that the proposed double-precision multiplier consumes up to 26.41% and 24.97% less power and energy than the fast floating-point multiplier respectively at the expense of only small area and delay overhead. In addition, the results also show that the performance of proposed floating-point multiplier is very approximate to that of fast floating-point multipliers.
|
3 |
Power Optimization for 3D Vertex Shader Using Clock GatingYen, Huai-yu 16 August 2008 (has links)
With technology increasingly and the needs of high performance and multiple functionalities, power dissipation has be a bottleneck in microprocessors. And clock power is the most percentage of total power dissipation. In our thesis, we will provide an effective clock gating methodology that has not more overhead possibly to decrease total power dissipations based on SIMD 3D vertex shader. Except for classify all instructions according the instruction flow, we also consider the relationship of pipeline stage and are based on register bank to control execution units more flexibility.
Using clock gating not only can decrease clock power, but also decrease the power of hardware modules succeed the registers with clock gating that be controlled. In our thesis, we will analysis which clock gating version is suitable because there is not definitely to disable the clock of all pipeline registers of all pipeline stages and hold all opportunities that can disable the clock. We will explain on experimental results and show the final low power version.
With experimental results, the clock gating methodology that we bring can decrease almost 30% power with increase less than 2% area. And collection of instruction schedule algorithm for high performance that can decrease 41% energy at most. In new version of four vertexes execute sequentially, using clock gating can also decrease almost 10% power dissipation. And collection of instruction schedule algorithm for this version not only has better performance result but also can decrease 16% energy at most.
|
4 |
Software optimization for power consumption in DSP embedded systemsTemple, Andrew Richard 09 December 2013 (has links)
This paper is intended to be a resource for programmers needing to optimize a DSP’s power consumption strictly through software. The paper will provide a basic introduction into power consumption background, measurement techniques, and then go into the details of power optimization, focusing on three main areas: algorithmic optimization, taking advantage of hardware features (low power modes, clock control, and voltage control), and data flow optimization with a discussion into the functionality and power considerations when using fast SRAM type memories (common for cache) and DDR SDRAM. This work includes examples and results as tested on Freescale’s current state of the art Digital Signal Processors. / text
|
5 |
Clock gatting for latch based designFigueroa Álvarez, Joaquín January 2012 (has links)
Ingeniero Civil Electricista / Los circuitos digitales, que juegan un papel crucial en la vida cotidiana, consumen grandes cantidades de potencia lo que es considerado como una situación no deseada, lo que es particularmente cierto para equipos que dependen de baterías como celulares, es por esto que los diseñadores de circuitos así como las herramientas de síntesis utilizan diferentes técnicas con el fin de reducir su consumo de potencia.
Una de las técnicas de reducción de potencia mas exitosas es clock-gating cuyo objetivo es reducir el consumo de potencia generado por las transiciones debidas a la señal de clk. La reducción de potencia se logra mediante la inserción de clock-gating cells (celdas de clock-gating) que impiden que la señal de clk llegue a los Flip-Flop cuando el valor de la salida de estos no se espera que cambie.
Los diseños basados en Latch, que si bien no son tan utilizados como los diseños basados en Flip-Flop debido a sus complejidades adicionales, todavía son utilizados gracias a ciertos beneficios que presentan las restricciones de timing (timing o sincronización) de los Latch, sin embargo ninguna de las herramientas de síntesis existentes permite la inserción automática de clock-gates para diseños basados en Latches, por lo que los diseñadores de circuitos se ven forzados a insertar las clock-gates de forma manual lo que es ineficiente.
El presente trabajo se enfoca en los mecanismos de clock-gating y los requisitos que se deben cumplir para permitir su uso en diseños basados en Latches desde la perspectiva de una herramienta de síntesis, al tiempo que provee de una discusión teórica sobre las diferencias entre Latches y Flip-Flops y como estas diferencias fuerzan los requerimientos de una herramienta de inserción de clock-gates
Considerando las restricciones que debieran aplicar para una herramienta de inserción de clock-gates automática enfocada en Latches y utilizando el entorno de desarrollo provisto por Synopsys así como el código existente en la herramienta de síntesis desarrollada por ellos, se desarrolla un prototipo de inserción de clock-gates para Latches como parte de Design-Compiler
El prototipo una vez embebido en Design-Compiler es probado en diversos diseños creados con este propósito y un diseño de mayor envergadura provisto por uno de los clientes de Synopsys y que es utilizado durante el desarrollo de circuitos reales, lo cual permite verificar la robustez de la herramienta desarrollada en diseños grandes.
|
6 |
La réduction de consommation dans les circuits digitaux / Power reduction in digital circuitsLáník, Jan 16 June 2016 (has links)
Le sujet de cette thèse est la réduction de consommation dans les circuits digitaux, et plus particulièrement dans ce cadre les méthodes basées sur la réduction de la fréquence de commutation moyenne, au niveau transistor. Ces méthodes sont structurelles, au sens où elles ne sont pas liées à l’optimisation des caractéristiques physique du circuit mais sur la structure de l’implémentation logique, et de ce fait parfaitement indépendantes de la technologie considérée. Nous avons développé dans ce cadre deux méthodes nouvelles. La première est basée sur l’optimisation de la structure de la partie combinatoire d’un circuit pendant la synthèse logique. La seconde est centrée sur la partie séquentielle du circuit. Elle consiste en la recherche de conditions permettant de détecter qu’un sous-circuit devient inactif, de sorte à pouvoir désactiver ce sous-circuit en coupant la branche correspondante de l’arbre d’horloge, et utilise des méthodes formelles pour prouver que la fonctionnalité du circuit n’en serait pas affectée. / The topic of this thesis are methods for power reduction in digital circuits by reducing average switching on the transistor level. These methods are structural in the sense that they are not related to tuning physical properties of the circuitry but to the internal structure of the implemented logic an d therefore independent on the particular technology. We developed two novel methods. One is based on optimizing the structure of the combinatorial part of a circuit during synthesis. The second method is focused on sequential part of the circuit. It looks for clock gating conditions that can be used to disable idle parts of a circuit and uses formal methods to prove that the function of the circuit will not be altered.
|
7 |
Reduzindo o consumo de energia em MPSoCs heterogêneos via clock gating / Reducing energy consumption in heterogeneous MPSoCs through clock gatingMotta, Rodrigo Bittencourt January 2008 (has links)
Nesse trabalho é apresentada uma arquitetura que habilita a geração de MPSoCs (Multiprocessors Systems-on-Chip) heterogêneos escaláveis, baseados em barramento, suportando ainda o uso de diferentes organizações de memória. A comunicação entre as tarefas é especificada por meio de uma estrutura de memória compartilhada, que evita colisões e promove ganhos energéticos através do disparo dinâmico de clock gating. Também é introduzida a técnica DCF (Dynamic Core Freezing), que incrementa a eficiência energética do MPSoC tirando proveito dos ciclos ociosos dos processadores durante os acessos à memória. Mais, a combinação das organizações de memória propostas habilita a exploração de migração de tarefas na arquitetura proposta, por meio da troca de contexto das tarefas na memória compartilhada. Além disso, é mostrado o simulador de alto-nível, baseado na arquitetura proposta, criado com o propósito de extrair os ganhos energéticos propiciados com o uso do clock gating e da técnica DCF. O simulador aceita como entrada arquivos de trace de execução de aplicações Java, com os quais ele gera um novo arquivo contendo o mapeamento das instruções encontradas nos arquivos de trace para diferentes classes de instrução. Dessa forma, podem ser modeladas diferentes arquiteturas de processadores, usando o arquivo com o mapeamento para simular o MPSoC. Mais, o simulador habilita ainda a exploração das diferentes organizações de memória da arquitetura proposta, de maneira que se pode estimar o seu impacto no número de instruções executadas, contenções no barramento, e consumo energético. Experimentos baseados em uma aplicação sintética, executando em um MPSoC composto por diferentes versões de um processador Java mostram um grande aumento na eficiência energética com um custo mínimo em área. Além disso, também são apresentados experimentos baseados em aplicações do benchmark SPECjvm98, que mostram o impacto causado na eficiência energética quando o tipo de aplicação é alterado. Mais, os experimentos mostram drásticos ganhos energéticos obtidos com a aplicação da técnica DCF sobre as memórias do MPSoC. / In this work we present an architecture that enables the generation of bus-based, scalable heterogeneous Multiprocessor Systems-on-Chip (MPSoCs), supporting different memory organizations. Intertask communication is specified by means of a shared memory structure that assures collision avoidance and promotes energy savings through a dynamic clock gating triggering. We also introduce a Dynamic Core Freezing (DCF) technique, which boosts energy savings taking advantage of processor idle cycles during memory accesses. Moreover, the combination of the memory organizations enables the architecture to exploit easy task migration by means of the task context saving in the shared data memory. Moreover, we show the high-level simulator, based on the proposed architecture, created in order to extract the energy savings enabled with the clock gating and the DCF techniques. The simulator accepts as input execution trace files of Java applications, from which it generates a new file that contains the mapping of the instructions found in the trace file for different instruction classes. This way, we can model different processor architectures, using the mapping file to simulate the MPSoC. Also, the simulator enables us to experiment with different memory organizations to estimate their impact on the executed instructions, bus contention, and energy consumption. As case study we have modeled different versions of a Java processor in order to experiment with different execution patterns over different memory organizations. Experiments based on a synthetic application running on an MPSoC containing different versions of a Java processor show a large improvement in energy efficiency with a minimal area cost. Besides that, we also present experiments based on applications of the SPECjvm98 benchmark, which show the impact on the energy efficiency when we change the application type. Moreover, the experiments show a huge improvement in the energy efficiency when applying the DCF technique to the MPSoC memories.
|
8 |
Multi-precision Function Interpolator for Multimedia ApplicationsCheng, Chien-Kang 25 July 2012 (has links)
A multi-precision function interpolator, which is fitted in with the IEEE-754 single precision floating point standard, is proposed in this paper. It provides logarithms, exponentials, reciprocal and square root reciprocal operations. Each operation is able to dynamically select four different precision modes in demand. The hardware architecture is designed with fully pipeline in order to comply with hardware architectures of general digital signal processors (DSPs) and graphics processors (GPUs).
When considering the usefulness of each precision mode, it is designed to minimize the error among various modes as far as possible in the beginning. According to the precision from high to low, function interpolator can provide 23, 18, 13 and 8-bit accuracy respectively in spite of the rounding effect. This function interpolator is designed based on the look-up table method. It can get the approximation value of target function through the calculation of quadratic polynomial. The coefficient of quadratic polynomial is obtained by piecewise minimax approximation. Before implementing the hardware, we use the Maple algebra software to generate the quadratic polynomial coefficients of aforementioned four operations, and estimate whether these coefficients can meet IEEE-754 single precision floating point standard. In addition, we take the exhaustive search to check the results generated by our implementation to make sure that it can meet the requirements for various operations and precision modes.
When performing one of the above four operations, only the tables of the operation are used to obtain the quadratic polynomial coefficient. Therefore, we can take the advantage of the tri-state buffer as a switch to reduce dynamic power consumption of tables for the other three operations. In addition, when performing lower precision modes, we can turn off a part of hardwares, which are used to calculate the quadratic polynomial, to save the power consumption more effectively. By providing multi-precision hardware, we hope users or developers, those who use the battery device, can choose a lower precision mode within the permissible error range to extend the battery life.
|
9 |
Reduzindo o consumo de energia em MPSoCs heterogêneos via clock gating / Reducing energy consumption in heterogeneous MPSoCs through clock gatingMotta, Rodrigo Bittencourt January 2008 (has links)
Nesse trabalho é apresentada uma arquitetura que habilita a geração de MPSoCs (Multiprocessors Systems-on-Chip) heterogêneos escaláveis, baseados em barramento, suportando ainda o uso de diferentes organizações de memória. A comunicação entre as tarefas é especificada por meio de uma estrutura de memória compartilhada, que evita colisões e promove ganhos energéticos através do disparo dinâmico de clock gating. Também é introduzida a técnica DCF (Dynamic Core Freezing), que incrementa a eficiência energética do MPSoC tirando proveito dos ciclos ociosos dos processadores durante os acessos à memória. Mais, a combinação das organizações de memória propostas habilita a exploração de migração de tarefas na arquitetura proposta, por meio da troca de contexto das tarefas na memória compartilhada. Além disso, é mostrado o simulador de alto-nível, baseado na arquitetura proposta, criado com o propósito de extrair os ganhos energéticos propiciados com o uso do clock gating e da técnica DCF. O simulador aceita como entrada arquivos de trace de execução de aplicações Java, com os quais ele gera um novo arquivo contendo o mapeamento das instruções encontradas nos arquivos de trace para diferentes classes de instrução. Dessa forma, podem ser modeladas diferentes arquiteturas de processadores, usando o arquivo com o mapeamento para simular o MPSoC. Mais, o simulador habilita ainda a exploração das diferentes organizações de memória da arquitetura proposta, de maneira que se pode estimar o seu impacto no número de instruções executadas, contenções no barramento, e consumo energético. Experimentos baseados em uma aplicação sintética, executando em um MPSoC composto por diferentes versões de um processador Java mostram um grande aumento na eficiência energética com um custo mínimo em área. Além disso, também são apresentados experimentos baseados em aplicações do benchmark SPECjvm98, que mostram o impacto causado na eficiência energética quando o tipo de aplicação é alterado. Mais, os experimentos mostram drásticos ganhos energéticos obtidos com a aplicação da técnica DCF sobre as memórias do MPSoC. / In this work we present an architecture that enables the generation of bus-based, scalable heterogeneous Multiprocessor Systems-on-Chip (MPSoCs), supporting different memory organizations. Intertask communication is specified by means of a shared memory structure that assures collision avoidance and promotes energy savings through a dynamic clock gating triggering. We also introduce a Dynamic Core Freezing (DCF) technique, which boosts energy savings taking advantage of processor idle cycles during memory accesses. Moreover, the combination of the memory organizations enables the architecture to exploit easy task migration by means of the task context saving in the shared data memory. Moreover, we show the high-level simulator, based on the proposed architecture, created in order to extract the energy savings enabled with the clock gating and the DCF techniques. The simulator accepts as input execution trace files of Java applications, from which it generates a new file that contains the mapping of the instructions found in the trace file for different instruction classes. This way, we can model different processor architectures, using the mapping file to simulate the MPSoC. Also, the simulator enables us to experiment with different memory organizations to estimate their impact on the executed instructions, bus contention, and energy consumption. As case study we have modeled different versions of a Java processor in order to experiment with different execution patterns over different memory organizations. Experiments based on a synthetic application running on an MPSoC containing different versions of a Java processor show a large improvement in energy efficiency with a minimal area cost. Besides that, we also present experiments based on applications of the SPECjvm98 benchmark, which show the impact on the energy efficiency when we change the application type. Moreover, the experiments show a huge improvement in the energy efficiency when applying the DCF technique to the MPSoC memories.
|
10 |
Técnicas de baixo consumo para módulos de hardware de codificação de vídeo H.264Walter, Fábio Leandro January 2011 (has links)
Este trabalho trata da aplicação de técnicas de minimização de consumo de potência para blocos digitais para o algoritmo de SAD e o decodificador H.264/AVC Intra-Only. Na descrição de hardware são acrescidas as técnicas de paralelismo e pipeline. Na síntese física e lógica, incluem-se as técnicas de inativação do relógio ( clock gating), múltiplas tensões de threshold, diferentes tecnologias e diferentes tensões de alimentação. A síntese é feita nas ferramentas da CadenceTM com exploração arquitetural e apresenta uma menor energia por operação, quando exigido desempenho equivalente (isoperformance ) para SAD, em baixa frequência, alto paralelismo e, principalmente, com um estágio de pipeline. Além disso, tecnologias CMOS mais avançadas diminuem o consumo de potência dinâmica e, em alguns casos, também diminuem a potência estática por gate equivalente, se utilizadas células High-VT e tensão de alimentação a menor possível. Outro fator a ser destacado é o uso do clock gating que no caso das arquiteturas de SAD, em vez de diminuir, aumenta o consumo de potência dinâmica. Neste trabalho foi realizada a síntese do decodificador Intra-Only. O decodificador com clock gating apresenta um menor consumo de potência, mostrando um caso em que esta técnica é benéfica. Além disso, a utilização de uma tecnologia CMOS 65 nm e, consequentemente, tensão de alimentação menor, levou a uma sensível diminuição no consumo de potência em relação a outros trabalhos similares. / This work presents low-power techniques applications to digital blocks in the SAD algorithm and in the Intra-Only H.264/AVC decoder. In the hardware description, we add parallelism and pipeline techniques. In the logical and physical synthesis exploration, includes the clock gating, multiple threshold voltage, different technologies and multiple supply voltage. The synthesis are done in the CadenceTM tools and show a smaller energy per operation in isoperformance for SAD at low frequency, high parallelism and, mainly, with one pipeline stage. In addition to that, more advanced CMOS technologies decrease the dynamic power consumption and, also, decrease the static power for equivalent gates, if using High-VT cells and lowest possible power supply. Another factor is the clock gating use that in the SAD architecture, instead of decreasing, increases the dynamic power consumption. In this work the design of an Intra-Only H.264/AVC Decoder was performed. This design with clock gating presents lower power consumption, showing a case in which this technique is beneficial in terms of dynamic power. Besides that, the 65 nm CMOS technology uses a lower power supply, resulting in lower power consumption in comparison to other related works.
|
Page generated in 0.0725 seconds