Global ETD Search

1	Power- and Performance - Aware Architectures Canal Corretger, Ramon 14 June 2004 (has links) The scaling of silicon technology has been ongoing for over forty years. We are on the way to commercializing devices having a minimum feature size of one-tenth of a micron. The push for miniaturization comes from the demand for higher functionality and higher performance at a lower cost. As a result, successively higher levels of integration have been driving up the power consumption of chips. Today, heat removal and power distribution are at the forefront of the problems faced by chip designers.In recent years portability has become important. Historically, portable applications were characterized by low throughput requirements such as for a wristwatch. This is no longer true.Among the new portable applications are hand-held multimedia terminals with video display and capture, audio reproduction and capture, voice recognition, and handwriting recognition capabilities. These capabilities call for a tremendous amount of computational capacity. This computational capacity has to be realized with very low power requirements in order for the battery to have a satisfactory life span. This thesis is an attempt to provide microarchitecture and compiler techniques for low-power chips with high-computational capacity.The first part of this work presents some schemes for reducing the complexity of the issue logic. The issue logic has become one of the main sources of energy consumption in recent years. The inherent associative look-up and the size of the structures (crucial for exploiting ILP), have led the issue logic to a significant energy budget. The techniques presented in this work eliminate or reduce the associative logic by determining producer-consumer relationships between the instructions or by scheduling the instructions according to the latency of the operations.An important effort has been deployed to reduce the energy requirements and the power dissipation through novel mechanisms based on value compression. As a result, the second part of this thesis introduces several ultra-low power and high-end processor designs. First, the design space for ultra-low power processors is explored. Several designs are developed (at the architectural level) from scratch that exploit value compression at all levels of the data-path. Second, value compression for high-performance processors is proposed and evaluated. At the end of this thesis, two compile-time techniques are presented that show how the compiler can help in reducing the energy consumption. By means of a static analysis of the program code or through profiling, the compiler is able to know the size of the operands involved in the computation. Through these analyses, the compiler is able to use narrower operations (i.e. a 64-bit addition can be converted to an 8-bit addition due to the information of the size of the operands).Overall, this thesis compromises the detailed study of one of the most power hungry units in a processor (the issue logic) and the use of value compression (through hardware and software) as a mean to reduce the energy consumption in all the stages of the pipeline. processor technology low power significance compression energy-efficient issue queue value compression computer architecture 3304. Tecnologia dels ordinadors 004 62
2	Scalable Low Power Issue Queue And Store Queue Design For Superscalar Processors Vivekanandham, Rajesh 12 1900 (has links) A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in out-of-order superscalar processors. Along with the instruction window size, the size of various other structures including the issue queue, store queue and register file need to increase as well. However, the cycle time and energy consumption of conventional large monolithic Content Addressable Memories (CAMs), the underlying structure of most conventional issue queue and store queue designs, worsen rapidly with an increase in size. This results in a three way trade-off involving ILP, clock frequency and energy consumption. In this thesis, we propose efficient designs for the issue queue and the store queue that improve the circuit latency and energy consumption while minimizing the loss in IPC. We propose the Scalable Low power Issue Queue (SLIQ) design which segments the issue queue structure to reduce the latency. This is complemented with a fast Wakeup index to a consumer in the issue queue for every instruction. As this consumer instruction can be woken up directly, without any delay, this mitigates the IPC loss faced by the pipelined issue queue. Also, as the scheme incorporates a pipelined broadcast, the indices are not required for correctness and can simply be gang invalidated on branch mispredictions. The IPC loss of an 8 segment SLIQ is Within 2.3% for the entire SPEC CPU2000 benchmark suite while achieving a 39.3% reduction in issue latency. Further, in the SLIQ design unnecessary broadcasts to the higher segments are avoided most of the time as in a large majority of the cases, an instruction has a single consumer. This consumer is woken up either by direct indexing or by broadcast in the first segment of the SLIQ. This enables the 8 segment SLIQ to significantly reduce the energy consumption and the energy-delay product by 48.3% and 67.4% respectively on an average. SLIQ also allows the architects to segment the issue queue carefully so that the latency of the issue logic is just within the per pipeline stage latency goals of the design. We also propose the Scalable Low power Store Queue (SLSQ) to address similar problems associated with the store queue data forwarding logic. We extend the state- of-the-art Store Vector based Disambiguator to also predict the index of the store that will forward to a given load. SLSQ marginally adds to the hardware budget, but predicts the store queue index of the store which will forward with an accuracy of 99.5% on an average. SLSQ, thus, eliminates unnecessary address broadcasts and Compares and reduces energy consumption of the store-to-load forwarding logic by 78.4% and 91.6% for the SPEC Int and FP suites respectively. Another variant of SLSQ, eliminates the need for a CAM in the forwarding logic and achieves a 49.9% reduction in store to load data forwarding latency while incurring a minimal IPC loss less than 0.1% on average for the entire SPEC CPU2000 benchmark suite. Parallel Processing (Computer Science) Queing Processes Queue Design Superscalar Processors Large Instruction Window Computer Science

Search results

Power- and Performance - Aware Architectures

Scalable Low Power Issue Queue And Store Queue Design For Superscalar Processors