Spelling suggestions: "subject:"soft errors"" "subject:"soft horrors""
51 |
Robust low-power signal processing and communication algorithmsNisar, Muhammad Mudassar 04 January 2010 (has links)
This thesis presents circuit-level techniques for soft error mitigation, low-power design with performance trade-off, and variation-tolerant low-power design. The proposed techniques are divided into two broad categories. First, error compensation techniques, which are used for soft error mitigation and also for low-power operation of linear and non-linear filters. Second, a framework for variation tolerant low-power operation of wireless devices is presented. This framework analyzes the effects of circuit "tuning knobs" such as voltage, frequency, wordlength precision, etc. on system performance, and power efficiency. Process variations are considered as well, and the best operating tuning knob levels are determined, which results in maximum system wide power savings while keeping the system performance within acceptable limits. Different methods are presented for variation-tolerant and power-efficient wireless communication. Techniques are also proposed for application driven low-power operation of the OFDM baseband receiver.
|
52 |
Robust Design of Variation-Sensitive Digital CircuitsMoustafa, Hassan January 2011 (has links)
The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to
12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors
(ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more
difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the
devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process
variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric
yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random
Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower
power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that
the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design.
Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating
voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We
develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for
die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be
used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the
knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology
prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS
technology.
The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control
the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data
causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power
dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of
secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of
super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these
flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful
recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while
taking the process variations impact and robustness requirements into account.
Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward
body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases
Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and
less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each
device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a
large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for
within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability
and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of
previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital
converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real
microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability
(NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and
test chip measurements using triple-well 65nm CMOS technology.
The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process
variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We
propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative
capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access
yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of
the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results
and test chip measurements using 65nm CMOS technology.
|
53 |
Transient-fault robust systems exploiting quasi-delay insensitive asynchronous circuits / Sistemas robustos a falhas transientes explorando circuitos assíncronos quase-insensíveis aos atrasosBastos, Rodrigo Possamai January 2010 (has links)
Os circuitos integrados recentes baseados em tecnologias nanoeletrônicas estão significativamente mais vulneráveis a falhas transientes. Os erros gerados são assim também mais críticos do que eram antes. Esta tese apresenta uma nova virtude em termos de confiabilidade dos circuitos assíncronos quase-insensíveis aos atrasos (QDI): a sua grande habilidade natural para mitigar falhas transientes de longa duração, que são severas em circuitos síncronos modernos. Uma metodologia para avaliar comparativamente os efeitos de falhas transientes tanto em circuitos síncronos como em circuitos assíncronos QDI é apresentada. Além disso, um método para obter a habilidade de mitigação de falhas transientes dos elementos de memória de circuitos QDI (ou seja, os C-elements) é também proposto. Por fim, técnicas de mitigação são sugeridas para aumentar ainda mais a atenuação de falhas transientes por parte dos Celements e, por consequência, também a robustez dos sistemas assíncronos QDI. / Recent deep-submicron technology-based ICs are significantly more vulnerable to transient faults. The arisen errors are thus also more critical than they have ever been before. This thesis presents a further novel benefit of the Quasi-Delay Insensitive (QDI) asynchronous circuits in terms of reliability: their strong natural ability to mitigate longduration transient faults that are severe in modern synchronous circuits. A methodology to evaluate comparatively the transient-fault effects on synchronous and QDI asynchronous circuits is presented. Furthermore, a method to obtain the transient-fault mitigation ability of the QDI circuits’ memory elements (i.e., the C-elements) is also proposed. Finally, mitigation techniques are suggested to increase even more the Celements’ transient-fault attenuation, and thus also the QDI asynchronous systems’ robustness.
|
54 |
Selective software-implemented hardware fault tolerance tecnhiques to detect soft errors in processors with reduced overhead / Técnicas seletivas de tolerência a falhas em software com custo reduzido para detectar erros causados por falhas transientes em processadoresChielle, Eduardo January 2016 (has links)
A utilização de técnicas de tolerância a falhas em software é uma forma de baixo custo para proteger processadores contra soft errors. Contudo, elas causam aumento no tempo de execução e utilização de memória. Em consequência disso, o consumo de energia também aumenta. Sistemas que operam com restrição de tempo ou energia podem ficar impossibilitados de utilizar tais técnicas. Por esse motivo, este trabalho propoe técnicas de tolerância a falhas em software com custos no desempenho e memória reduzidos e cobertura de falhas similar a técnicas presentes na literatura. Como detecção é menos custoso que correção, este trabalho foca em técnicas de detecção. Primeiramente, um conjunto de técnicas de dados baseadas em regras de generalização, chamada VAR, é apresentada. As técnicas são baseadas nesse conjunto generalizado de regras para permitir uma investigação exaustiva, em termos de confiabilidade e custos, de diferentes variações de técnicas. As regras definem como a técnica duplica o código e insere verificadores. Cada técnica usa um diferente conjunto de regras. Então, uma técnica de controle, chamada SETA, é introduzida. Comparando SETA com uma técnica estado-da-arte, SETA é 11.0% mais rápida e ocupa 10.3% menos posições de memória. As técnicas de dados mais promissoras são combinadas com a técnica de controle com o objetivo de proteger tanto os dados quanto o fluxo de controle da aplicação alvo. Para reduzir ainda mais os custos, métodos para aplicar seletivamente as técnicas propostas foram desenvolvidos. Para técnica de dados, em vez de proteger todos os registradores, somente um conjunto de registradores selecionados é protegido. O conjunto é selecionado com base em uma métrica que analisa o código e classifica os registradores por sua criticalidade. Para técnicas de controle, há duas abordagens: (1) remover verificadores de blocos básicos, e (2) seletivamente proteger blocos básicos. As técnicas e suas versões seletivas são avaliadas em termos de tempo de execução, tamanho do código, cobertura de falhas, e o Mean Work to Failure (MWTF), o qual é uma métrica que mede o compromisso entre cobertura de falhas e tempo de execução. Resultados mostram redução dos custos sem diminuição da cobertura de falhas, e para uma pequena redução na cobertura de falhas foi possível significativamente reduzir os custos. Por fim, uma vez que a avaliação de todas as possíveis combinações utilizando métodos seletivos toma muito tempo, este trabalho utiliza um método para extrapolar os resultados obtidos por simulação com o objetivo de encontrar os melhores parâmetros para a proteção seletiva e combinada de técnicas de dados e de controle que melhorem o compromisso entre confiabilidade e custos. / Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operation with time or energy restrictions may not be able to make use of these techniques. For this reason, this work proposes software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Once detection is less costly than correction, the work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads.
|
55 |
Transient-fault robust systems exploiting quasi-delay insensitive asynchronous circuits / Sistemas robustos a falhas transientes explorando circuitos assíncronos quase-insensíveis aos atrasosBastos, Rodrigo Possamai January 2010 (has links)
Os circuitos integrados recentes baseados em tecnologias nanoeletrônicas estão significativamente mais vulneráveis a falhas transientes. Os erros gerados são assim também mais críticos do que eram antes. Esta tese apresenta uma nova virtude em termos de confiabilidade dos circuitos assíncronos quase-insensíveis aos atrasos (QDI): a sua grande habilidade natural para mitigar falhas transientes de longa duração, que são severas em circuitos síncronos modernos. Uma metodologia para avaliar comparativamente os efeitos de falhas transientes tanto em circuitos síncronos como em circuitos assíncronos QDI é apresentada. Além disso, um método para obter a habilidade de mitigação de falhas transientes dos elementos de memória de circuitos QDI (ou seja, os C-elements) é também proposto. Por fim, técnicas de mitigação são sugeridas para aumentar ainda mais a atenuação de falhas transientes por parte dos Celements e, por consequência, também a robustez dos sistemas assíncronos QDI. / Recent deep-submicron technology-based ICs are significantly more vulnerable to transient faults. The arisen errors are thus also more critical than they have ever been before. This thesis presents a further novel benefit of the Quasi-Delay Insensitive (QDI) asynchronous circuits in terms of reliability: their strong natural ability to mitigate longduration transient faults that are severe in modern synchronous circuits. A methodology to evaluate comparatively the transient-fault effects on synchronous and QDI asynchronous circuits is presented. Furthermore, a method to obtain the transient-fault mitigation ability of the QDI circuits’ memory elements (i.e., the C-elements) is also proposed. Finally, mitigation techniques are suggested to increase even more the Celements’ transient-fault attenuation, and thus also the QDI asynchronous systems’ robustness.
|
56 |
Conception de processeur tolérant aux fautes à faible coût et hautement efficace pour remédier aux problèmes de fiabilité dans les technologies nanométriques / Low-cost highly-efficient fault tolerant processor design for mitigating the reliability issues in nanometric technologiesYu, Hai 02 December 2011 (has links)
Divers domaines d'application des systèmes électroniques, comme par exemple les implants médicaux ou les puces cryptographiques pour les appareils portables, exigent à la fois une très faible puissance consommé et un niveau de fiabilité très élevé. De plus, comme la miniaturisation des technologies CMOS s'approche de ses limites ultimes, ces exigences deviennent nécessaires pour l'ensemble de l'industrie de microélectronique. En effet, en approchant ces limites les problèmes de la dissipation de puissance, du rendement de fabrication et de la fiabilité des composants empirent, rendant la poursuite de miniaturisation nanométriques de plus en plus difficile. Ainsi, avant que ces problèmes bloquent le progrès technologique, des nouvelles solutions au niveau du processus de fabrication et du design sont exigées pour maintenir la puissance dissipée, le rendement de fabrication et la fiabilité à des niveaux acceptables. Le projet de thèse vise le développement des architectures tolérantes aux fautes capables de répondre à ces défis pour les technologies de fabrication CMOS présentes et à venir. Ces architectures devraient permettre d'améliorer le rendement de fabrication et la fiabilité et de réduire en même temps la puissance dissipée des composants électroniques. Elles conduiraient en une innovation majeure, puisque les architectures tolérant aux fautes traditionnelles permettraient d'améliorer le rendement de fabrication et la fiabilité des composants électroniques aux dépens d'une pénalité significative en puissance consommée. / Various applications of electronic systems, such as medical implant devices, or cryptographic chips for potable devices require both lower power dissipation and higher level of reliability. Moreover, as silicon-based CMOS technologies are fast approaching their ultimate limits, these requirements become necessary for the entire microelectronics industry. Indeed, by approaching these limits, power dissipation, fabrication yield, and reliability worsen steadily making further nanometric scaling increasingly difficult. Thus, before reaching these limits, these problems could become show-stoppers unless new techniques are introduced to maintain acceptable levels of power dissipation, fabrication yield and reliability. This thesis aims to develop a fault tolerant architecture for logic designs that conciliates the above contradictory challenges and provides a global solution to the yield, reliability and power dissipation issues in current and future nanometric technologies. The proposed fault tolerant architecture is expected to improve the fabrication yield and reliability while reducing the power dissipation of electronic components. It leads a breakthrough, since traditional fault-tolerant architectures introduce significant area and power penalties.
|
57 |
Architectures pour des circuits fiables de hautes performances / Architectures for reliable and high performance circuitsBonnoit, Thierry 18 October 2012 (has links)
Les technologies nanométriques ont réduit la fiabilité des circuits électroniques, notamment en les rendant plus sensible aux phénomènes extérieurs. Cela peut provoquer une modification des composants de stockage, ou la perturbation de fonctions logiques. Ce problème est plus préoccupant pour les mémoires, plus sensibles aux perturbations extérieures. Les codes correcteurs d'erreurs constituent l'une des solutions les plus utilisées, mais les contraintes de fiabilité conduisent à utiliser des codes plus complexes, et qui ont une influence négative sur la bande passante du système. Nous proposons une méthode qui supprime la perte de temps due à ces codes lors de l'écriture des données en mémoire, et la limite aux seuls cas où une erreur est détectée lors de la lecture. Pour cela on procède à la décontamination du circuit après qu'une donnée erronée ait été propagée dans le circuit, ce qui nécessite de restaurer certains des états précédents de quelques composants de stockage par l'ajout de FIFO. Un algorithme identifiant leurs lieux d'implémentation a également été créé. Nous avons ensuite évalué l'impact de cette méthode dans le contexte plus large suivant : la restauration d'un état précédent de l'ensemble du circuit en vue de corriger une erreur transistoire susceptible de se produire n'importe où dans le circuit. / Nanometric technologies led to a decrease of electronic circuit reliability, especially against external phenomena. Those may change the state of storage components, or interfere with logical components. In fact, this issue is more critical for memories, as they are more sensitive to external radiations. The error correcting codes are one of the most used solutions. However, reliability constraints require codes that are more and more complex. These codes have a negative effect on the system bandwidth. We propose a generic methodology that removes the timing penalty of error correcting codes during memory's write operation. Moreover, it limits the speed penalty for read operation only in the rare case an error is detected. To proceed, the circuit is decontaminated after uncorrected data were propagated inside the circuit. This technique may require restoring some past states of few storage components by adding some FIFO. An algorithm that identifies these components was also created. Then we try to evaluate the impact of such a technique for the following issue: the global state restoration of a circuit to erase all kinds of soft errors, everywhere inside the circuit.
|
58 |
Selective software-implemented hardware fault tolerance tecnhiques to detect soft errors in processors with reduced overhead / Técnicas seletivas de tolerência a falhas em software com custo reduzido para detectar erros causados por falhas transientes em processadoresChielle, Eduardo January 2016 (has links)
A utilização de técnicas de tolerância a falhas em software é uma forma de baixo custo para proteger processadores contra soft errors. Contudo, elas causam aumento no tempo de execução e utilização de memória. Em consequência disso, o consumo de energia também aumenta. Sistemas que operam com restrição de tempo ou energia podem ficar impossibilitados de utilizar tais técnicas. Por esse motivo, este trabalho propoe técnicas de tolerância a falhas em software com custos no desempenho e memória reduzidos e cobertura de falhas similar a técnicas presentes na literatura. Como detecção é menos custoso que correção, este trabalho foca em técnicas de detecção. Primeiramente, um conjunto de técnicas de dados baseadas em regras de generalização, chamada VAR, é apresentada. As técnicas são baseadas nesse conjunto generalizado de regras para permitir uma investigação exaustiva, em termos de confiabilidade e custos, de diferentes variações de técnicas. As regras definem como a técnica duplica o código e insere verificadores. Cada técnica usa um diferente conjunto de regras. Então, uma técnica de controle, chamada SETA, é introduzida. Comparando SETA com uma técnica estado-da-arte, SETA é 11.0% mais rápida e ocupa 10.3% menos posições de memória. As técnicas de dados mais promissoras são combinadas com a técnica de controle com o objetivo de proteger tanto os dados quanto o fluxo de controle da aplicação alvo. Para reduzir ainda mais os custos, métodos para aplicar seletivamente as técnicas propostas foram desenvolvidos. Para técnica de dados, em vez de proteger todos os registradores, somente um conjunto de registradores selecionados é protegido. O conjunto é selecionado com base em uma métrica que analisa o código e classifica os registradores por sua criticalidade. Para técnicas de controle, há duas abordagens: (1) remover verificadores de blocos básicos, e (2) seletivamente proteger blocos básicos. As técnicas e suas versões seletivas são avaliadas em termos de tempo de execução, tamanho do código, cobertura de falhas, e o Mean Work to Failure (MWTF), o qual é uma métrica que mede o compromisso entre cobertura de falhas e tempo de execução. Resultados mostram redução dos custos sem diminuição da cobertura de falhas, e para uma pequena redução na cobertura de falhas foi possível significativamente reduzir os custos. Por fim, uma vez que a avaliação de todas as possíveis combinações utilizando métodos seletivos toma muito tempo, este trabalho utiliza um método para extrapolar os resultados obtidos por simulação com o objetivo de encontrar os melhores parâmetros para a proteção seletiva e combinada de técnicas de dados e de controle que melhorem o compromisso entre confiabilidade e custos. / Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operation with time or energy restrictions may not be able to make use of these techniques. For this reason, this work proposes software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Once detection is less costly than correction, the work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads.
|
59 |
Transient-fault robust systems exploiting quasi-delay insensitive asynchronous circuits / Sistemas robustos a falhas transientes explorando circuitos assíncronos quase-insensíveis aos atrasosBastos, Rodrigo Possamai January 2010 (has links)
Os circuitos integrados recentes baseados em tecnologias nanoeletrônicas estão significativamente mais vulneráveis a falhas transientes. Os erros gerados são assim também mais críticos do que eram antes. Esta tese apresenta uma nova virtude em termos de confiabilidade dos circuitos assíncronos quase-insensíveis aos atrasos (QDI): a sua grande habilidade natural para mitigar falhas transientes de longa duração, que são severas em circuitos síncronos modernos. Uma metodologia para avaliar comparativamente os efeitos de falhas transientes tanto em circuitos síncronos como em circuitos assíncronos QDI é apresentada. Além disso, um método para obter a habilidade de mitigação de falhas transientes dos elementos de memória de circuitos QDI (ou seja, os C-elements) é também proposto. Por fim, técnicas de mitigação são sugeridas para aumentar ainda mais a atenuação de falhas transientes por parte dos Celements e, por consequência, também a robustez dos sistemas assíncronos QDI. / Recent deep-submicron technology-based ICs are significantly more vulnerable to transient faults. The arisen errors are thus also more critical than they have ever been before. This thesis presents a further novel benefit of the Quasi-Delay Insensitive (QDI) asynchronous circuits in terms of reliability: their strong natural ability to mitigate longduration transient faults that are severe in modern synchronous circuits. A methodology to evaluate comparatively the transient-fault effects on synchronous and QDI asynchronous circuits is presented. Furthermore, a method to obtain the transient-fault mitigation ability of the QDI circuits’ memory elements (i.e., the C-elements) is also proposed. Finally, mitigation techniques are suggested to increase even more the Celements’ transient-fault attenuation, and thus also the QDI asynchronous systems’ robustness.
|
60 |
Using Reinforcement Learning to Correct Soft Errors of Deep Neural Networks / Använda Förstärkningsinlärning för att Upptäcka och Mildra Mjuka Fel i Djupa Neurala NätverkLi, Yuhang January 2023 (has links)
Deep Neural Networks (DNNs) are becoming increasingly important in various aspects of human life, particularly in safety-critical areas such as autonomous driving and aerospace systems. However, soft errors including bit-flips can significantly impact the performance of these systems, leading to serious consequences. To ensure the reliability of DNNs, it is essential to guarantee their performances. Many solutions have been proposed to enhance the trustworthiness of DNNs, including traditional methods like error correcting code (ECC) that can mitigate and detect soft errors but come at a high cost of redundancy. This thesis proposes a new method of correcting soft errors in DNNs using Deep Reinforcement Learning (DRL) and Transfer Learning (TL). DRL agent can learn the knowledge of identifying the layer-wise critical weights of a DNN. To accelerate the training time, TL is used to apply this knowledge to train other layers. The primary objective of this method is to ensure acceptable performance of a DNN by mitigating the impact of errors on it while maintaining low redundancy. As a case study, we tested the proposed method approach on a multilayer perception (MLP) and ResNet-18, and our results show that our method can save around 25% redundancy compared to the baseline method ECC while achieving the same level of performance. With the same redundancy, our approach can boost system performance by up to twice that of conventional methods. By implementing TL, the training time of MLP is shortened to around 81.11%, and that of ResNet-18 is shortened to around 57.75%. / DNNs blir allt viktigare i olika aspekter av mänskligt liv, särskilt inom säkerhetskritiska områden som autonom körning och flygsystem. Mjuka fel inklusive bit-flip kan dock påverka prestandan hos dessa system avsevärt, vilket leder till allvarliga konsekvenser. För att säkerställa tillförlitligheten hos DNNs är det viktigt att garantera deras prestanda. Många lösningar har föreslagits för att förbättra tillförlitligheten för DNNs, inklusive traditionella metoder som ECC som kan mildra och upptäcka mjuka fel men som har en hög kostnad för redundans. Denna avhandling föreslår en ny metod för att korrigera mjuka fel i DNN med DRL och TL. DRL-agenten kan lära sig kunskapen om att identifiera de lagermässiga kritiska vikterna för en DNN. För att påskynda träningstiden används TL för att tillämpa denna kunskap för att träna andra lager. Det primära syftet med denna metod är att säkerställa acceptabel prestanda för en DNN genom att mildra inverkan av fel på den samtidigt som låg redundans bibehålls. Som en fallstudie testade vi den föreslagna metodmetoden på en MLP och ResNet-18, och våra resultat visar att vår metod kan spara cirka 25% redundans jämfört med baslinjemetoden ECC samtidigt som vi uppnår samma prestationsnivå. Med samma redundans kan vårt tillvägagångssätt öka systemets prestanda med upp till dubbelt så högt som för konventionella metoder. Genom att implementera TL förkortas träningstiden för MLP till cirka 81.11%, och den för ResNet-18 förkortas till cirka 57.75%.
|
Page generated in 0.0542 seconds