Return to search

Empirical timing analysis of CPUs and delay fault tolerant design using partial redundancy

The operating clock frequency is determined by the longest signal propagation
delay, setup/hold time, and timing margin. These are becoming less predictable with
the increasing design complexity and process miniaturization. The difficult challenge
is then to ensure that a device operating at its clock frequency is error-free with
quantifiable assurance. Effort at device-level engineering will not suffice for these
circuits exhibiting wide process variation and heightened sensitivities to operating
condition stress. Logic-level redress of this issue is a necessity and we propose a
design-level remedy for this timing-uncertainty problem.
The aim of the design and analysis approaches presented in this dissertation is to
provide framework, SABRE, wherein an increased operating clock frequency can be
achieved. The approach is a combination of analytical modeling, experimental analy-
sis, hardware /time-redundancy design, exception handling and recovery techniques.
Our proposed design replicates only a necessary part of the original circuit to avoid
high hardware overhead as in triple-modular-redundancy (TMR). The timing-critical
combinational circuit is path-wise partitioned into two sections. The combinational
circuits associated with long paths are laid out without any intrusion except for the
fan-out connections from the first section of the circuit to a replicated second section
of the combinational circuit. Thus only the second section of the circuit is replicated.
The signals fanning out from the first section are latches, and thus are far shorter than the paths spanning the entire combinational circuit. The replicated circuit is timed
at a subsequent clock cycle to ascertain relaxed timing paths. This insures that the
likelihood of mistiming due to stress or process variation is eliminated. During the
subsequent clock cycle, the outcome of the two logically identical, yet time-interleaved,
circuit outputs are compared to detect faults. When a fault is detected, the retry sig-
nal is triggered and the dynamic frequency-step-down takes place before a pipe flush,
and retry is issued. The significant timing overhead associated with the retry is offset
by the rarity of the timing violation events. Simulation results on ISCAS Benchmark
circuits show that 10% of clock frequency gain is possible with 10 to 20 % of hardware
overhead of replicated timing-critical circuit.

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-1270
Date15 May 2009
CreatorsChang, Sanghoan
ContributorsChoi, Gwan S.
Source SetsTexas A and M University
Languageen_US
Detected LanguageEnglish
TypeBook, Thesis, Electronic Dissertation, text
Formatelectronic, application/pdf, born digital

Page generated in 0.0024 seconds