Global ETD Search

61	Techniques pour l'évaluation et l'amélioration du comportement des technologies émergentes face aux fautes aléatoires / Techniques for the evaluation and the improvement of emergent technologies’ behavior facing random errors Costenaro, Enrico 09 December 2015 (has links) L'objectif principal de cette thèse est de développer des techniques d'analyse et mitigation capables à contrer les effets des Evènements Singuliers (Single Event Effects) - perturbations externes et internes produites par les particules radioactives, affectant la fiabilité et la sureté en fonctionnement des circuits microélectroniques complexes. Cette thèse à la vocation d'offrir des solutions et méthodologies industrielles pour les domaines d'applications terrestres exigeant une fiabilité ultime (télécommunications, dispositifs médicaux, ...) en complément des travaux précédents sur les Soft Errors, traditionnellement orientés vers les applications aérospatiales, nucléaires et militaires.Les travaux présentés utilisent une décomposition de sources d'erreurs dans les circuits actuels, visant à mettre en évidence les contributeurs les plus importants.Les upsets (SEU) - Evènements Singuliers (ES) dans les cellules logiques séquentielles représentent actuellement la cible principale pour les efforts d'analyse et d'amélioration à la fois dans l'industrie et dans l'académie. Cette thèse présente une méthodologie d'analyse basée sur la prise en compte de la sensibilité de chaque état logique d'une cellule (state-awareness), approche qui améliore considérablement la précision des résultats concernant les taux des évènements pour les instances séquentielles individuelles. En outre, le déséquilibre intrinsèque entre la susceptibilité des différents états des bascules est exploité pour mettre en œuvre une stratégie d'amélioration SER à très faible coût.Les fautes transitoires (SET) affectant la logique combinatoire sont beaucoup plus difficiles à modéliser, à simuler et à analyser que les SEUs. L'environnement radiatif peut provoquer une multitude d'impulsions transitoires dans les divers types de cellules qui sont utilisés en configurations multiples. Cette thèse présente une approche pratique pour l'analyse SET, applicable à des circuits industriels très complexes. Les principales étapes de ce processus consiste à: a) caractériser complètement la bibliothèque de cellules standard, b) évaluer les SET dans les réseaux logiques du circuit en utilisant des méthodes statiques et dynamiques et c) calculer le taux SET global en prenant en compte les particularités de l'implémentation du circuit et de son environnement.L'injection de fautes reste la principale méthode d'analyse pour étudier l'impact des fautes, erreurs et disfonctionnements causés par les évènements singuliers. Ce document présente les résultats d'une analyse fonctionnelle d'un processeur complexe dans la présence des fautes et pour une sélection d'applications (benchmarks) représentatifs. Des techniques d'accélération de la simulation (calculs probabilistes, clustering, simulations parallèles) ont été proposées et évalués afin d'élaborer un environnement de validation industriel, capable à prendre en compte des circuits très complexes. Les résultats obtenus ont permis l'élaboration et l'évaluation d'un hypothétique scénario de mitigation qui vise à améliorer sensiblement, et cela au moindre coût, la fiabilité du circuit sous test. Les résultats obtenus montrent que les taux d'erreur, SDC (Silent Data Corruption) et DUE (Detectable Uncorrectable Errors) peuvent être considérablement réduits par le durcissement d'un petite partie du circuit (protection sélective). D'autres techniques spécifiques ont été également déployées: mitigation du taux de soft-errors des Flip-Flips grâce à une optimisation du Temporal De-Rating par l'insertion sélective de retard sur l'entrée ou la sortie des bascules et biasing du circuit pour privilégier les états moins sensibles.Les méthodologies, algorithmes et outils CAO proposés et validés dans le cadre de ces travaux sont destinés à un usage industriel et ont été valorisés dans le cadre de plateforme CAO commerciale visant à offrir une solution complète pour l'évaluation de la fiabilité des circuits et systèmes électroniques complexes. / The main objective of this thesis is to develop analysis and mitigation techniques that can be used to face the effects of radiation-induced soft errors - external and internal disturbances produced by radioactive particles, affecting the reliability and safety in operation complex microelectronic circuits. This thesis aims to provide industrial solutions and methodologies for the areas of terrestrial applications requiring ultimate reliability (telecommunications, medical devices, ...) to complement previous work on Soft Errors traditionally oriented aerospace, nuclear and military applications.The work presented uses a decomposition of the error sources, inside the current circuits, to highlight the most important contributors.Single Event Effects in sequential logic cells represent the current target for analysis and improvement efforts in both industry and academia. This thesis presents a state-aware analysis methodology that improves the accuracy of Soft Error Rate data for individual sequential instances based on the circuit and application. Furthermore, the intrinsic imbalance between the SEU susceptibility of different flip-flop states is exploited to implement a low-cost SER improvement strategy.Single Event Transients affecting combinational logic are considerably more difficult to model, simulate and analyze than the closely-related Single Event Upsets. The working environment may cause a myriad of distinctive transient pulses in various cell types that are used in widely different configurations. This thesis presents practical approach to a possible exhaustive Single Event Transient evaluation flow in an industrial setting. The main steps of this process consists in: a) fully characterize the standard cell library using a process and library-aware SER tool, b) evaluate SET effects in the logic networks of the circuit using a variety dynamic (simulation-based) and static (probabilistic) methods and c) compute overall SET figures taking into account the particularities of the implementation of the circuit and its environment.Fault-injection remains the primary method for analyzing the effects of soft errors. This document presents the results of functional analysis of a complex CPU. Three representative benchmarks were considered for this analysis. Accelerated simulation techniques (probabilistic calculations, clustering, parallel simulations) have been proposed and evaluated in order to develop an industrial validation environment, able to take into account very complex circuits. The results obtained allowed the development and evaluation of a hypothetical mitigation scenario that aims to significantly improve the reliability of the circuit at the lowest cost.The results obtained show that the error rate, SDC (Silent Data Corruption) and DUE (Detectable Uncorrectable Errors) can be significantly reduced by hardening a small part of the circuit (Selective mitigation).In addition to the main axis of research, some tangential topics were studied in collaboration with other teams. One of these consisted in the study of a technique for the mitigation of flip-flop soft-errors through an optimization of the Temporal De-Rating (TDR) by selectively inserting delay on the input or output of flip-flops.The Methodologies, the algorithms and the CAD tools proposed and validated as part of the work are intended for industrial use and have been included in a commercial CAD framework that offers a complete solution for assessing the reliability of circuits and complex electronic systems. Événements singuliers Événements singuliers upsets Événements singuliers transitoire Soft erreurs Injection de fautes Protection sélective Single-event effects Single-event upsets Single-event transients Soft errors Fault-injection Selective mitigation 620
62	Análise de soft errors em conversores analógico-digitais e mitigação utilizando redundância e diversidade Chenet, Cristiano Pegoraro January 2015 (has links) Este trabalho aborda os soft errors em conversores de dados analógico-digitais e a mitigação usando redundância e diversidade. Nas tecnologias CMOS recentes, os efeitos singulares (SEEs, Single Event Effects) são um grupo de efeitos da radiação espacial que afetam a confiabilidade e disponibilidade dos sistemas. Os soft errors são SEEs que não danificam diretamente o sistema e podem ser posteriormente corrigidos. Seus principais subgrupos são o Single Event Upset (SEU), o Single Event Transient (SET) e o Single Event Functional Interrupt (SEFI). Uma das técnicas em nível de sistema amplamente usadas para proteger os circuitos eletrônicos desses efeitos é a Redundância Modular Tripla (TMR, Triple Modular Redundancy), que pode ainda ser melhorada com a adição da técnica de diversidade. Nesse contexto, esse trabalho adota um esquema baseado nessas duas técnicas para a implementação de um sistema de aquisição de dados (SAD) analógico-digital. Seus objetivos são observar o comportamento dos conversores de dados frente aos soft errors e avaliar a eficácia de um sistema baseado em TMR e diversidade espacial-temporal contra esses efeitos da radiação. A implementação desse SAD em um SoC (System-on-Chip) da Cypress Semiconductor, chamado PSoC 5LP e fabricado em tecnologia CMOS de 130 nm, propiciou a realização de dois estudos: no primeiro, é realizada a irradiação com nêutrons, caso de particular interesse para os equipamentos eletrônicos embarcados em aviões; e no segundo, são realizadas injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM do PSoC 5LP. O resultado da irradiação do primeiro estudo foi a não observância de erros, o que impediu cumprir os objetivos propostos para esse teste. Essa situação permitiu duas observações principais: primeiro, o fluxo de nêutrons do experimento é uma característica fundamental que impacta na capacidade de se observar os efeitos da radiação, principalmente quando a seção de choque do circuito em análise é baixa; e segundo, de que a probabilidade de ocorrerem mascaramentos de SETs nos circuitos combinacionais e analógicos é elevada, o que contribui significativamente para reduzir a sensibilidade desses circuitos. Para avaliar a eficácia do sistema baseado em TMR e diversidade espacial-temporal foi então realizada uma investigação teórica baseada em análise combinatória, e os resultados mostraram que a adição de diversidade temporal gera, em comparação ao TMR clássico, um ganho significativo na tolerância de falhas duplas e múltiplas, ao preço de um aumento do atraso do circuito. Os resultados das injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM mostraram que apenas um baixo percentual das falhas injetadas é detectado na forma de erros, convergindo para a justificativa de que os mascaramentos foram determinantes para a não observância de erros no primeiro estudo, de injeção de falhas por radiação. Também verificou-se que os registradores de controle dos periféricos são mais importantes no nível de aplicação do que os dados da memória SRAM. Considerações sobre a auto injeção de falhas e auto monitoramento sugerem que a utilização desses conceitos pode trazer diversas limitações e complicadores aos testes. / The present thesis addresses the soft errors in analog-to-digital data converters and mitigation of such errors using redundancy and diversity. In modern CMOS technologies, the Single Event Effects (SEEs) comprises an important group of space radiation effects that influence the reliability and availability of the systems. Soft errors are SEEs that do not directly damage the system and that can be further corrected. Their main subgroups are the Single Event Upset (SEU), the Single Event Transient (SET) and the Single Event Functional Interrupt (SEFI). One of the system level techniques broadly used to protect the electronic circuits against these effects is the Triple Modular Redundancy (TMR), which may be improved with the addition of the diversity technique. In this context, this work proposes a scheme based on these two techniques to implement a tolerant analog-to-digital data acquisition system (DAS). The main objectives are to observe the behavior of the data converters under soft errors, and evaluate the effectiveness of a system based on TMR and spatial-temporal diversity on mitigating these radiation effects. The implementation of this DAS in a Programmable SoC (System-on-Chip) from Cypress Semiconductor (PSoC 5LP) manufactured in 130 nm CMOS, allowed the development of two studies. In the first one, an irradiation with neutrons is performed, case of particular interest to electronic equipment embedded on planes. In the second study, runtime software fault injections are performed at the peripheral control registers and SRAM of the studied device. As a result from irradiation on the first study no errors were found, what does not allowed meet the objectives of this test. This situation allow two main observations: first, the neutron flux of the experiment is a key feature that influences the ability to observe the radiation effects, mainly when the cross section of the circuit in analysis is low; and second, the probability of occurring SETs masking in combinational and analog circuits is high, which contributes significantly to reduce the sensibility of these circuits. To evaluate the effectiveness of a system based on TMR and spatial-temporal diversity then was performed a theoretical investigation based on combinatorial analysis, and the results show that the addition of temporal diversity generates a significant gain in tolerating double and multiple faults, if compared to the classical TMR, at the price of an increase in the circuit delay. The results of the second study, performed by runtime software fault injections at the peripheral control registers and SRAM, showed that only a low percentage of injected faults is detected as errors, according to the justification that no errors were found on irradiation of neutrons due to masking. Also was verified that at the application level the peripheral control registers are more important than the data stored in the SRAM memory. Considerations for faults self-injection and self-monitoring were done, suggesting that the use of these concepts may bring numerous limitations to the test. Conversor analogico/digital Circuitos eletrônicos Cmos Redundância modular tripla Single event effects (SEE) Soft errors Triple modular redundancy (TMR) Redundancy and diversity Programmable system-on-chip (PSoC) Analog-to-digital converters
63	Análise de soft errors em conversores analógico-digitais e mitigação utilizando redundância e diversidade Chenet, Cristiano Pegoraro January 2015 (has links) Este trabalho aborda os soft errors em conversores de dados analógico-digitais e a mitigação usando redundância e diversidade. Nas tecnologias CMOS recentes, os efeitos singulares (SEEs, Single Event Effects) são um grupo de efeitos da radiação espacial que afetam a confiabilidade e disponibilidade dos sistemas. Os soft errors são SEEs que não danificam diretamente o sistema e podem ser posteriormente corrigidos. Seus principais subgrupos são o Single Event Upset (SEU), o Single Event Transient (SET) e o Single Event Functional Interrupt (SEFI). Uma das técnicas em nível de sistema amplamente usadas para proteger os circuitos eletrônicos desses efeitos é a Redundância Modular Tripla (TMR, Triple Modular Redundancy), que pode ainda ser melhorada com a adição da técnica de diversidade. Nesse contexto, esse trabalho adota um esquema baseado nessas duas técnicas para a implementação de um sistema de aquisição de dados (SAD) analógico-digital. Seus objetivos são observar o comportamento dos conversores de dados frente aos soft errors e avaliar a eficácia de um sistema baseado em TMR e diversidade espacial-temporal contra esses efeitos da radiação. A implementação desse SAD em um SoC (System-on-Chip) da Cypress Semiconductor, chamado PSoC 5LP e fabricado em tecnologia CMOS de 130 nm, propiciou a realização de dois estudos: no primeiro, é realizada a irradiação com nêutrons, caso de particular interesse para os equipamentos eletrônicos embarcados em aviões; e no segundo, são realizadas injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM do PSoC 5LP. O resultado da irradiação do primeiro estudo foi a não observância de erros, o que impediu cumprir os objetivos propostos para esse teste. Essa situação permitiu duas observações principais: primeiro, o fluxo de nêutrons do experimento é uma característica fundamental que impacta na capacidade de se observar os efeitos da radiação, principalmente quando a seção de choque do circuito em análise é baixa; e segundo, de que a probabilidade de ocorrerem mascaramentos de SETs nos circuitos combinacionais e analógicos é elevada, o que contribui significativamente para reduzir a sensibilidade desses circuitos. Para avaliar a eficácia do sistema baseado em TMR e diversidade espacial-temporal foi então realizada uma investigação teórica baseada em análise combinatória, e os resultados mostraram que a adição de diversidade temporal gera, em comparação ao TMR clássico, um ganho significativo na tolerância de falhas duplas e múltiplas, ao preço de um aumento do atraso do circuito. Os resultados das injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM mostraram que apenas um baixo percentual das falhas injetadas é detectado na forma de erros, convergindo para a justificativa de que os mascaramentos foram determinantes para a não observância de erros no primeiro estudo, de injeção de falhas por radiação. Também verificou-se que os registradores de controle dos periféricos são mais importantes no nível de aplicação do que os dados da memória SRAM. Considerações sobre a auto injeção de falhas e auto monitoramento sugerem que a utilização desses conceitos pode trazer diversas limitações e complicadores aos testes. / The present thesis addresses the soft errors in analog-to-digital data converters and mitigation of such errors using redundancy and diversity. In modern CMOS technologies, the Single Event Effects (SEEs) comprises an important group of space radiation effects that influence the reliability and availability of the systems. Soft errors are SEEs that do not directly damage the system and that can be further corrected. Their main subgroups are the Single Event Upset (SEU), the Single Event Transient (SET) and the Single Event Functional Interrupt (SEFI). One of the system level techniques broadly used to protect the electronic circuits against these effects is the Triple Modular Redundancy (TMR), which may be improved with the addition of the diversity technique. In this context, this work proposes a scheme based on these two techniques to implement a tolerant analog-to-digital data acquisition system (DAS). The main objectives are to observe the behavior of the data converters under soft errors, and evaluate the effectiveness of a system based on TMR and spatial-temporal diversity on mitigating these radiation effects. The implementation of this DAS in a Programmable SoC (System-on-Chip) from Cypress Semiconductor (PSoC 5LP) manufactured in 130 nm CMOS, allowed the development of two studies. In the first one, an irradiation with neutrons is performed, case of particular interest to electronic equipment embedded on planes. In the second study, runtime software fault injections are performed at the peripheral control registers and SRAM of the studied device. As a result from irradiation on the first study no errors were found, what does not allowed meet the objectives of this test. This situation allow two main observations: first, the neutron flux of the experiment is a key feature that influences the ability to observe the radiation effects, mainly when the cross section of the circuit in analysis is low; and second, the probability of occurring SETs masking in combinational and analog circuits is high, which contributes significantly to reduce the sensibility of these circuits. To evaluate the effectiveness of a system based on TMR and spatial-temporal diversity then was performed a theoretical investigation based on combinatorial analysis, and the results show that the addition of temporal diversity generates a significant gain in tolerating double and multiple faults, if compared to the classical TMR, at the price of an increase in the circuit delay. The results of the second study, performed by runtime software fault injections at the peripheral control registers and SRAM, showed that only a low percentage of injected faults is detected as errors, according to the justification that no errors were found on irradiation of neutrons due to masking. Also was verified that at the application level the peripheral control registers are more important than the data stored in the SRAM memory. Considerations for faults self-injection and self-monitoring were done, suggesting that the use of these concepts may bring numerous limitations to the test. Conversor analogico/digital Circuitos eletrônicos Cmos Redundância modular tripla Single event effects (SEE) Soft errors Triple modular redundancy (TMR) Redundancy and diversity Programmable system-on-chip (PSoC) Analog-to-digital converters
64	Análise de soft errors em conversores analógico-digitais e mitigação utilizando redundância e diversidade Chenet, Cristiano Pegoraro January 2015 (has links) Este trabalho aborda os soft errors em conversores de dados analógico-digitais e a mitigação usando redundância e diversidade. Nas tecnologias CMOS recentes, os efeitos singulares (SEEs, Single Event Effects) são um grupo de efeitos da radiação espacial que afetam a confiabilidade e disponibilidade dos sistemas. Os soft errors são SEEs que não danificam diretamente o sistema e podem ser posteriormente corrigidos. Seus principais subgrupos são o Single Event Upset (SEU), o Single Event Transient (SET) e o Single Event Functional Interrupt (SEFI). Uma das técnicas em nível de sistema amplamente usadas para proteger os circuitos eletrônicos desses efeitos é a Redundância Modular Tripla (TMR, Triple Modular Redundancy), que pode ainda ser melhorada com a adição da técnica de diversidade. Nesse contexto, esse trabalho adota um esquema baseado nessas duas técnicas para a implementação de um sistema de aquisição de dados (SAD) analógico-digital. Seus objetivos são observar o comportamento dos conversores de dados frente aos soft errors e avaliar a eficácia de um sistema baseado em TMR e diversidade espacial-temporal contra esses efeitos da radiação. A implementação desse SAD em um SoC (System-on-Chip) da Cypress Semiconductor, chamado PSoC 5LP e fabricado em tecnologia CMOS de 130 nm, propiciou a realização de dois estudos: no primeiro, é realizada a irradiação com nêutrons, caso de particular interesse para os equipamentos eletrônicos embarcados em aviões; e no segundo, são realizadas injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM do PSoC 5LP. O resultado da irradiação do primeiro estudo foi a não observância de erros, o que impediu cumprir os objetivos propostos para esse teste. Essa situação permitiu duas observações principais: primeiro, o fluxo de nêutrons do experimento é uma característica fundamental que impacta na capacidade de se observar os efeitos da radiação, principalmente quando a seção de choque do circuito em análise é baixa; e segundo, de que a probabilidade de ocorrerem mascaramentos de SETs nos circuitos combinacionais e analógicos é elevada, o que contribui significativamente para reduzir a sensibilidade desses circuitos. Para avaliar a eficácia do sistema baseado em TMR e diversidade espacial-temporal foi então realizada uma investigação teórica baseada em análise combinatória, e os resultados mostraram que a adição de diversidade temporal gera, em comparação ao TMR clássico, um ganho significativo na tolerância de falhas duplas e múltiplas, ao preço de um aumento do atraso do circuito. Os resultados das injeções de falhas por software e em tempo de execução nos registradores de controle dos periféricos e na SRAM mostraram que apenas um baixo percentual das falhas injetadas é detectado na forma de erros, convergindo para a justificativa de que os mascaramentos foram determinantes para a não observância de erros no primeiro estudo, de injeção de falhas por radiação. Também verificou-se que os registradores de controle dos periféricos são mais importantes no nível de aplicação do que os dados da memória SRAM. Considerações sobre a auto injeção de falhas e auto monitoramento sugerem que a utilização desses conceitos pode trazer diversas limitações e complicadores aos testes. / The present thesis addresses the soft errors in analog-to-digital data converters and mitigation of such errors using redundancy and diversity. In modern CMOS technologies, the Single Event Effects (SEEs) comprises an important group of space radiation effects that influence the reliability and availability of the systems. Soft errors are SEEs that do not directly damage the system and that can be further corrected. Their main subgroups are the Single Event Upset (SEU), the Single Event Transient (SET) and the Single Event Functional Interrupt (SEFI). One of the system level techniques broadly used to protect the electronic circuits against these effects is the Triple Modular Redundancy (TMR), which may be improved with the addition of the diversity technique. In this context, this work proposes a scheme based on these two techniques to implement a tolerant analog-to-digital data acquisition system (DAS). The main objectives are to observe the behavior of the data converters under soft errors, and evaluate the effectiveness of a system based on TMR and spatial-temporal diversity on mitigating these radiation effects. The implementation of this DAS in a Programmable SoC (System-on-Chip) from Cypress Semiconductor (PSoC 5LP) manufactured in 130 nm CMOS, allowed the development of two studies. In the first one, an irradiation with neutrons is performed, case of particular interest to electronic equipment embedded on planes. In the second study, runtime software fault injections are performed at the peripheral control registers and SRAM of the studied device. As a result from irradiation on the first study no errors were found, what does not allowed meet the objectives of this test. This situation allow two main observations: first, the neutron flux of the experiment is a key feature that influences the ability to observe the radiation effects, mainly when the cross section of the circuit in analysis is low; and second, the probability of occurring SETs masking in combinational and analog circuits is high, which contributes significantly to reduce the sensibility of these circuits. To evaluate the effectiveness of a system based on TMR and spatial-temporal diversity then was performed a theoretical investigation based on combinatorial analysis, and the results show that the addition of temporal diversity generates a significant gain in tolerating double and multiple faults, if compared to the classical TMR, at the price of an increase in the circuit delay. The results of the second study, performed by runtime software fault injections at the peripheral control registers and SRAM, showed that only a low percentage of injected faults is detected as errors, according to the justification that no errors were found on irradiation of neutrons due to masking. Also was verified that at the application level the peripheral control registers are more important than the data stored in the SRAM memory. Considerations for faults self-injection and self-monitoring were done, suggesting that the use of these concepts may bring numerous limitations to the test. Conversor analogico/digital Circuitos eletrônicos Cmos Redundância modular tripla Single event effects (SEE) Soft errors Triple modular redundancy (TMR) Redundancy and diversity Programmable system-on-chip (PSoC) Analog-to-digital converters
65	Low Overhead Soft Error Mitigation Methodologies Prasanth, V January 2012 (has links) (PDF) CMOS technology scaling is bringing new challenges to the designers in the form of new failure modes. The challenges include long term reliability failures and particle strike induced random failures. Studies have shown that increasingly, the largest contributor to the device reliability failures will be soft errors. Due to reliability concerns, the adoption of soft error mitigation techniques is on the increase. As the soft error mitigation techniques are increasingly adopted, the area and performance overhead incurred in their implementation also becomes pertinent. This thesis addresses the problem of providing low cost soft error mitigation. The main contributions of this thesis include, (i) proposal of a new delayed capture methodology for low overhead soft error detection, (ii) adopting Error Control Coding (ECC) for delayed capture methodology for correction of single event upsets, (iii) analyzing the impact of different derating factors to reduce the hardware overhead incurred by the above implementations, and (iv) proposal for hardware software co-design for reliability based upon critical component identification determined by the application executing on the hardware (as against standalone hardware analysis). This thesis first surveys existing soft error mitigation techniques and their associated limitations. It proposes a new delayed capture methodology as a low overhead soft error detection technique. Delayed capture methodology is an enhancement of the Razor flip-flop methodology. In the delayed capture methodology, the parity for a set of flip-flops is calculated at their inputs and outputs. The input parity is latched on a second clock, which is delayed with respect to the functional clock by more than the soft error pulse width. It requires an extra flip-flop for each set of flip-flops. On the other hand, in the Razor flip-flop methodology an additional flip-flop is required for every functional flip-flop. Due to the skew in the clocks, either the parity flip-flop or the functional flip-flop will capture the effect of transient, and hence by comparing the output parity and latched input parity an error can be detected. Fault injection experiments are performed to evaluate the bneefits and limitations of the proposed approach. The limitations include soft error detection escapes and lack of error correction capability. Different cases of soft error detection escapes are analyzed. They are attributed mainly to a Single Event Upset (SEU) causing multiple flip-flops within a group to be in error. The error space due to SEUs is analyzed and an intelligent flip-flop grouping method using graph theoretic formulations is proposed such that no SEU can cause multiple flip-flops within a group to be in error. Once the error occurs, leaving the correction aspects to the application may not be desirable. The proposed delayed capture methodology is extended to replace parity codes with codes having higher redundancy to enable correction. The hardware overhead due to the proposed methodology is analyzed and an area savings of about 15% is obtained when compared to an existing soft error mitigation methodology with equivalent coverage. The impact of different derating factors in determining the hardware overhead due to the soft error mitigation methodology is then analyzed. We have considered electrical derating and timing derating information for the evaluation purpose. The area overhead of the circuit with implementation of delayed capture methodology, considering different derating factors standalone and in combination is then analyzed. Results indicate that in different circuits, either a combination of these derating factors yield optimal results, or each of them considered standalone. This is due to the dependency of the solution on the heuristic nature of the algorithms used. About 23% area savings are obtained by employing these derating factors for a more optimal grouping of flip-flops. A new paradigm of hardware software co-design for reliability is finally proposed. This is based on application derating in which the application / firmware code is profiled to identify the critical components which must be guarded from soft errors. This identification is based on the ability of the application software to tolerate certain errors in hardware. An algorithm to identify critical components in the control logic based on fault injection is developed. Experimental results indicated that for a safety critical automotive application, only 12% of the sequential logic elements were found to be critical. This approach provides a framework for investigating how software methods can complement hardware methods, to provide a reduced hardware solution for soft error mitigation. Soft Error Mitigation Hardware Reliability Delayed Capture Methodology Fault Tolerant Computing Hardware Derating Transient Faults Soft Errors Soft Error Tolerant Designs Error Control Coding (ECC) Delayed Capture Fault Tolerance Computer Science
66	Comprehensive Backend Support for Local Memory Fault Tolerance Rink, Norman Alexander, Castrillon, Jeronimo 19 December 2016 (has links) (PDF) Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact. vorübergehende Hardware-Fehler Speicherfehler Fehlererkennung Fehlertoleranz Compiler-Backend Code-Generierung Zwischendarstellung (IR) LLVM transient hardware faults soft errors memory faults error detection fault tolerance resilience compiler backend code generation intermediate representation (IR) LLVM ddc:004 rvk:SS 5514
67	Comprehensive Backend Support for Local Memory Fault Tolerance Rink, Norman Alexander, Castrillon, Jeronimo 19 December 2016 (has links) Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact. info:eu-repo/classification/ddc/004 ddc:004
68	Fault-tolerant Programming Models and Computing Frameworks Kurt, Mehmet Can 14 October 2015 (has links) No description available. Computer Science
69	Performance optimization mechanisms for fault-resilient VLIW processors / Mécanismes d'optimisation des performances des processeurs VLIW à tolérance de fautes Psiakis, Rafail 21 December 2018 (has links) Les processeurs intégrés dans des domaines critiques exigent une combinaison de fiabilité, de performances et de faible consommation d'énergie. Very Large Instruction Word (VLIW) processeurs améliorent les performances grâce à l'exploitation ILP (Instruction Level Parallelism), tout en maintenant les coûts et la puissance à un niveau bas. L’ILP étant fortement dépendant de l'application, le processeur n'utilise pas toutes ses ressources en permanence et ces ressources peuvent donc être utilisées pour l'exécution d'instructions redondantes. Cette thèse présente une méthodologie d’injection fautes pour processeurs VLIW et trois mécanismes matériels pour traiter les pannes légères, permanentes et à long terme menant à trois contributions.La première contribution présente un schéma d’analyse du facteur de vulnérabilité architecturale et du facteur de vulnérabilité d’instruction pour les processeurs VLIW. Une méthodologie d’injection de fautes au niveau de différentes structures de mémoire est proposée pour extraire les capacités de masquage architecture / instruction du processeur. Un schéma de classification des défaillances de haut niveau est présenté pour catégoriser la sortie du processeur. La deuxième contribution explore les ressources inactives hétérogènes au moment de l'exécution, à l'intérieur et à travers des ensembles d'instructions consécutifs. Pour ce faire, une technique d’ordonnancement des instructions optimisée pour le matériel est appliquée en parallèle avec le pipeline afin de contrôler efficacement la réplication et l’ordonnancement des instructions. Suivant les tendances à la parallélisation croissante, une conception basée sur les clusters est également proposée pour résoudre les problèmes d’évolutivité, tout en maintenant une pénalité surface/énergie raisonnable. La technique proposée accélère la performance de 43,68% avec une surcoût en surface et en énergie de ~10% par rapport aux approches existantes. Les analyses AVF et IVF évaluent la vulnérabilité du processeur avec le mécanisme proposé.La troisième contribution traite des défauts persistants. Un mécanisme matériel est proposé, qui réplique au moment de l'exécution les instructions et les planifie aux emplacements inactifs en tenant compte des contraintes de ressources. Si une ressource devient défaillante, l'approche proposée permet de relier efficacement les instructions d'origine et les instructions répliquées pendant l'exécution. Les premiers résultats de performance d’évaluation montrent un gain de performance jusqu’à 49% sur les techniques existantes.Afin de réduire davantage le surcoût lié aux performances et de prendre en charge l’atténuation des erreurs uniques et multiples sur les transitoires de longue durée (LDT), une quatrième contribution est présentée. Nous proposons un mécanisme matériel qui détecte les défauts toujours actifs pendant l'exécution et réorganise les instructions pour utiliser non seulement les unités fonctionnelles saines, mais également les composants sans défaillance des unités fonctionnelles concernées. Lorsque le défaut disparaît, les composants de l'unité fonctionnelle concernés peuvent être réutilisés. La fenêtre de planification du mécanisme proposé comprend deux ensembles d'instructions pouvant explorer des solutions d'atténuation lors de l'exécution de l'instruction en cours et de l'instruction suivante. Les résultats obtenus sur l'injection de fautes montrent que l'approche proposée peut atténuer un grand nombre de fautes avec des performances, une surface et une surcharge de puissance faibles. / Embedded processors in critical domains require a combination of reliability, performance and low energy consumption. Very Long Instruction Word (VLIW) processors provide performance improvements through Instruction Level Parallelism (ILP) exploitation, while keeping cost and power in low levels. Since the ILP is highly application dependent, the processor does not use all its resources constantly and, thus, these resources can be utilized for redundant instruction execution. This thesis presents a fault injection methodology for VLIW processors and three hardware mechanisms to deal with soft, permanent and long-term faults leading to three contributions. The first contribution presents an Architectural Vulnerability Factor (AVF) and Instruction Vulnerability Factor (IVF) analysis schema for VLIW processors. A fault injection methodology at different memory structures is proposed to extract the architectural/instruction masking capabilities of the processor. A high-level failure classification schema is presented to categorize the output of the processor. The second contribution explores heterogeneous idle resources at run-time both inside and across consecutive instruction bundles. To achieve this, a hardware optimized instruction scheduling technique is applied in parallel with the pipeline to efficiently control the replication and the scheduling of the instructions. Following the trends of increasing parallelization, a cluster-based design is also proposed to tackle the issues of scalability, while maintaining a reasonable area/power overhead. The proposed technique achieves a speed-up of 43.68% in performance with a ~10% area and power overhead over existing approaches. AVF and IVF analysis evaluate the vulnerability of the processor with the proposed mechanism.The third contribution deals with persistent faults. A hardware mechanism is proposed which replicates at run-time the instructions and schedules them at the idle slots considering the resource constraints. If a resource becomes faulty, the proposed approach efficiently rebinds both the original and replicated instructions during execution. Early evaluation performance results show up to 49\% performance gain over existing techniques.In order to further decrease the performance overhead and to support single and multiple Long-Duration Transient (LDT) error mitigation a fourth contribution is presented. We propose a hardware mechanism, which detects the faults that are still active during execution and re-schedules the instructions to use not only the healthy function units, but also the fault-free components of the affected function units. When the fault faints, the affected function unit components can be reused. The scheduling window of the proposed mechanism is two instruction bundles being able to explore mitigation solutions in the current and the next instruction execution. The obtained fault injection results show that the proposed approach can mitigate a large number of faults with low performance, area, and power overhead. Very Long Instruction Word (VLIW) Tolérance aux fautes Exploitation des ressources inactives Erreurs légères Long Duration Transients (LDTs) Erreurs permanentes Planification des instructions en ligne Very Long Instruction Word (VLIW) Fault Tolerance Idle resource exploitation Soft errors Long Duration Transients(LDTs) Permanent Errors Hardware online re-Scheduling

Search results