As technology scales, the device delay decreases while the interconnect delay increases. As more devices are being packed into a single chip, the cost of interconnecting these devices increases. Many three-dimensional (3D) schemes have been proposed to reduce interconnect length, to improve performance with lower power consumption. The impact of wire length reduction on global clock distribution networks is limited. The delay and skew of a clock grid is mostly dominated by the area of the chip it has to cover. Another challenge in distributing clock to multiple layers in a vertical stack is achieving synchronization between the various layers. In this work the use of a clock layer exclusively for generating and distributing clocks is proposed. Vertical vias connect the clock grid in each layer to the clock layer, and hence provides synchronization between the various layers.
In all synchronous systems clock is the single most critical signal, it is routed throughout the chip and provides the synchronization between the various operations of the chip. Clock distribution networks are extremely critical from the performance and power standpoint. They account for about 30% of the total power dissipated in current generation microprocessors. As technology scales, the chip sizes are also increasing due to the increased functionality. This means larger clock distribution networks and hence more power lost in the clock network. Another critical parameter in clock networks is that skew in the clock network affects performance of the synchronous system. As frequency scales with technology, the goal is to achieve the skew as a fixed percentage of clock period. This implies an aggressive clock network design which minimizes power dissipation but still provides the same performance.
A clock distribution methodology for a 3D multilayer single-core microprocessor, using a single clock layer is proposed. The clock distribution network consists of a symmetric H-tree driving the global clock grids in each layer of the multilayer microprocessor. This arrangement of a 3D chip stack reduces Power lost in (a) Long interconnects at block level and (b) In the clock distribution. Using the proposed clock distribution scheme a 15-20% saving on the clock distribution power was achieved compared to a 2D structure with the same distribution scheme. By switching off the global clock grids in individual layers, when all the underlying logic is turned off, an additional 5-10% savings in power is achieved. The 3D clock distribution network also provides better skew numbers than its 2D counterpart and hence achieves the goal of improving performance and reducing power. The 3D clock distribution network was also verified with an RLC model for the interconnect. The effect of a vertical temperature profile was also investigated on the clock distribution network.
Identifer | oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:theses-1279 |
Date | 01 January 2009 |
Creators | Arunachalam, Venkatesh |
Publisher | ScholarWorks@UMass Amherst |
Source Sets | University of Massachusetts, Amherst |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses 1911 - February 2014 |
Page generated in 0.0015 seconds