Return to search

Self-correcting strategy for networks-on-chip interconnect

Networks-on-Chip (NoC) interconnection provides an on-chip communication strategy for a large number of processing elements System-on- Chip. Fault tolerance is a challenge for modern NoCs due to the increase in physical defects in advanced manufacturing processes. A key requirement for modern NoCs is the ability to detect faults and failures and to self-correct after faults occur thereby maintaining a level of system functionality. However, existing fault-tolerant approaches cannot fully address system scalability and fault testing with minimal intrusion, in addition they fail to provide robust self-correction strategies under complex traffic conditions. Therefore, it is necessary to look to new fault detection and self-correction strategies to address this reliable design issue and to enable the design of reliable systems on unreliable fabrics. This thesis presents a novel online fault detection strategy where the intrusion of the runtime operation under testing is minimised. If the channel is faulty, an alert flag is raised. By using this alert flag mechanism, three novel fault-tolerant adaptive routing algorithms are proposed to provide selfcorrecting strategies for NoCs. They exploit the status of real-time traffic with different levels (local or regional) look-ahead functions, then calculate weights for output directions or path candidates, and choose the path with the lowest weighting to forward the packets. The key benefit of these routing algorithms is to bypass a routing path with faulty channels while minimising congestion for the adjacent connected channels. The detailed experimental results are given for a range of testing conditions, traffic patterns and fault rates, which demonstrate that the faults can be detected promptly with minimal intrusion and the routing algorithms are able to maintain a level of system functionality under high fault rates with a low cost. In particular, experimental results demonstrate that the proposed detection and self-correction strategy achieves an overall between 24%-62% improvement on throughput degradation under varied high fault rates compared to benchmarks. The thesis also presents an open-source monitoring mechanism which provides an evaluation and benchmarking mechanism to quantitatively analyse a hardware NoC system's fault-tolerant capability. By using this monitoring mechanism, the thesis concludes with hardware verification of the detection and self-correction algorithms in FPGA hardware. The FPGA implementations present the throughput performance, fault-tolerant capabilities and resource costs of the three different fault-tolerant adaptive routing algorithms, in particular, the implementations demonstrate the realtime operation of the proposed self-correction strategies in hardware while under the presence of varied levels of faults.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:675467
Date January 2015
CreatorsLiu, Junxiu
PublisherUlster University
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0017 seconds