Return to search

Fault tolerant techniques for asynchronous networks on chip

Advancing semiconductor technology is boosting the core count on a single chip to achieve continuously increasing performance, posing a growing demand for scalable, efficient and reliable on-chip interconnection. However this advance also makes the electronics increasingly vulnerable to faults. Inter-core connection is increasingly provided by Networks-on-Chip (NoCs), typically using conventional synchronous designs. Scaling makes it increasingly hard to avoid problems with clock distribution and in many chips a single, synchronous domain is inappropriate, anyway. In place of the well-studied synchronous NoCs, event-driven asynchronous NoCs have emerged as a promising replacement. Asynchronous NoCs have many promising advantages over synchronous ones; however, their fault-tolerance has rarely been studied. Implemented in a Quasi-Delay-Insensitive (QDI) fashion, asynchronous NoCs can achieve high timing-robustness but show complicated failure scenarios in the presence of faults and behave differently from synchronous ones, posing a challenge to asynchronous circuit advocates. This research studies the impact of different faults on QDI NoC fabrics and presents thorough and systematic fault-tolerant solutions at the circuit level, providing a holistic, efficient and resilient interconnection solution for QDI NoCs. The contributions of this research include: 1) a thorough analysis of fault impact on QDI NoCs; 2) a Delay-Insensitive Redundant Check (DIRC) coding scheme protecting QDI links from transient faults; 3) a novel time-out technique detecting the fault-caused physical-layer deadlock in a QDI NoC (the adaptability of a QDI circuit to timing variation makes it vulnerable to this kind of deadlock); 4) a fine-grained recovery technique utilising a Spatial Division Multiplexing (SDM) implementation to recover the deadlocked network from a link fault. Both unprotected and protected QDI NoCs are implemented, along with a fault simulation environment, to provide a detailed performance and fault-tolerance evaluation of these techniques. The improvements to the NoC operation, together with the costs in circuit overhead and throughput are enumerated using a typical example of QDI interconnection.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:694275
Date January 2016
CreatorsZhang, Guangda
ContributorsGarside, James ; Navaridas Palma, Javier
PublisherUniversity of Manchester
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.research.manchester.ac.uk/portal/en/theses/fault-tolerant-techniques-for-asynchronous-networks-on-chip(ce820f3a-461a-4e48-b1a3-d98fa6497a68).html

Page generated in 0.002 seconds