Return to search

Selective Software-Implemented Hardware Fault Tolerance Techniques to Detect Soft Errors in Processors with Reduced Overhead

Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operating with time or energy restrictions may not be able to use these techniques. For this reason, this work proposes new software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Thus, they can meet the system constraints. In addition, the shorter execution time reduces the exposure time to radiation. Consequently, the reliability is higher for the same fault coverage. Techniques can work with error correction or error detection. Once detection is less costly than correction, this work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads.

Identiferoai:union.ndltd.org:ua.es/oai:rua.ua.es:10045/62467
Date30 July 2016
CreatorsChielle, Eduardo
ContributorsCuenca-Asensi, Sergio, Kastensmidt, Fernanda L., Universidad de Alicante. Instituto Universitario de Investigación Informática, Universidade Federal do Rio Grande do Sul. Instituto de Informática
PublisherUniversidad de Alicante
Source SetsUniversidad de Alicante
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
RightsLicencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0, info:eu-repo/semantics/openAccess

Page generated in 0.0055 seconds