Global ETD Search

31	Designing fault tolerant NoCs to improve reliability on SoCs / Projeto de NoCs tolerantes a falhas para o aumento da confiabilidade em SoCs Frantz, Arthur Pereira January 2007 (has links) Com a redução das dimensões dos dispositivos nas tecnologias sub-micrônicas foi possível um grande aumento no número de IP cores integrados em um mesmo chip e consequentemente novas arquiteturas de comunicação são usadas bucando atingir os requisitos de desempenho e potência. As redes intra-chip (Networks-on-Chip) foram propostas como uma plataforma alternativa de comunicação capaz de prover interconexões e comunicação entre os cores de um mesmo chip, tratando questões como desempenho, consumo de energia e reusabilidade para grandes sistemas integrados. Por outro lado, a mesma evolução tecnológica dos processos nanométricos reduziu drasticamente a confiabilidade de circuitos integrados, tornando dispositivos e interconexões mais sensíveis a novos tipos de falhas. Erros podem ser gerados por variações no processo de fabricação ou mesmo pela susceptibilidade do projeto, quando este opera em um ambiente hostil. Na comunicação de NoCs as duas principais fontes de erros são falhas de crosstalk e soft errors. No passado, se assumia que interconexões não poderiam ser afetadas por soft errors, por não possuirem circuitos seqüenciais. Porém, quando NoCs são usadas, buffers e circuitos seqüenciais estão presentes nos roteadores e, consequentemente, podem ocorrer soft errors entre a fonte e o destino da comunicação, provocando erros. Técnicas de tolerância a falhas, que tem sido aplicadas em circuitos em geral, podem ser usadas para proteger roteadores contra bit-flips. Neste cenário, este trabalho inicia com a avaliação dos efeitos de soft errors e falhas de crosstalk em uma arquitetura de NoC, através de simulação de injeção de falhas, analisando detalhadamente o impacto de tais falhas no roteador. Os resultados mostram que os efeitos dessas falhas na comunicação do SoC podem ser desastrosos, levando a perda de pacotes e travamento ou indisponibilidade do sistema. Então é proposta e avaliada a aplicação de um conjunto de técnicas de tolerância a falhas em roteadores, possibilitando diminuir os soft errors e falhas de crosstalk no nível de hardware. Estas técnicas propostas foram baseadas em códigos de correção de erros e redundância de hardware. Resultados experimentais mostram que estas técnicas podem obter zero erros com 50% a menos de overhead de área, quando comparadas com a duplicação simples. Entretanto, algumas dessas técnicas têm um grande consumo de potência, pois toda essas técnicas são baseadas na adição de hardware redundante. Considerando que as técnicas de proteção baseadas em software também impõe um considerável overhead na comunicação devido à retransmissão, é proposto o uso de técnicas mistas de hardware e software, que podem oferecer um nível de proteção satisfatório, baseado na análise do ambiente onde o sistema irá operar (soft error rate), fatores relativos ao projeto e fabricação (variações de atraso em interconexões, pontos susceptíveis a crosstalk), a probabilidade de uma falha gerar um erro em um roteador, a carga de comunicação e os limites de potência e energia suportados. / As the technology scales down into deep sub-micron domain, more IP cores are integrated in the same die and new communication architectures are used to meet performance and power constraints. Networks-on-Chip have been proposed as an alternative communication platform capable of providing interconnections and communication among onchip cores, handling performance, energy consumption and reusability issues for large integrated systems. However, the same advances to nanometric technologies have significantly reduced reliability in mass-produced integrated circuits, increasing the sensitivity of devices and interconnects to new types of failures. Variations at the fabrication process or even the susceptibility of a design under a hostile environment might generate errors. In NoC communications the two major sources of errors are crosstalk faults and soft errors. In the past, it was assumed that connections cannot be affected by soft errors because there was no sequential circuit involved. However, when NoCs are used, buffers and sequential circuits are present in the routers, consequently, soft errors can occur between the communication source and destination provoking errors. Fault tolerant techniques that once have been applied in integrated circuits in general can be used to protect routers against bit-flips. In this scenario, this work starts evaluating the effects of soft errors and crosstalk faults in a NoC architecture by performing fault injection simulations, where it has been accurate analyzed the impact of such faults over the switch service. The results show that the effect of those faults in the SoC communication can be disastrous, leading to loss of packets and system crash or unavailability. Then it proposes and evaluates a set of fault tolerant techniques applied at routers able to mitigate soft errors and crosstalk faults at the hardware level. Such proposed techniques were based on error correcting codes and hardware redundancy. Experimental results show that using the proposed techniques one can obtain zero errors with up to 50% of savings in the area overhead when compared to simple duplication. However some of these techniques are very power consuming because all the tolerance is based on adding redundant hardware. Considering that softwarebased mitigation techniques also impose a considerable communication overhead due to retransmission, we then propose the use of mixed hardware-software techniques, that can develop a suitable protection scheme driven by the analysis of the environment that the system will operate in (soft error rate), the design and fabrication factors (delay variations in interconnects, crosstalk enabling points), the probability of a fault generating an error in the router, the communication load and the allowed power or energy budget. Microeletrônica SoC Deteccao : Erros Networks-on-chip Fault tolerance Soft errors Crosstalk
32	Selective Software-Implemented Hardware Fault Tolerance Techniques to Detect Soft Errors in Processors with Reduced Overhead Chielle, Eduardo 30 July 2016 (has links) Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operating with time or energy restrictions may not be able to use these techniques. For this reason, this work proposes new software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Thus, they can meet the system constraints. In addition, the shorter execution time reduces the exposure time to radiation. Consequently, the reliability is higher for the same fault coverage. Techniques can work with error correction or error detection. Once detection is less costly than correction, this work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads. Fault Tolerance Embedded Processors Soft Errors SIHFT
33	Self-Healing Cellular Automata to Correct Soft Errors in Defective Embedded Program Memories Voddi, Varun 01 December 2009 (has links) Static Random Access Memory (SRAM) cells in ultra-low power Integrated Circuits (ICs) based on nanoscale Complementary Metal Oxide Semiconductor (CMOS) devices are likely to be the most vulnerable to large-scale soft errors. Conventional error correction circuits may not be able to handle the distributed nature of such errors and are susceptible to soft errors themselves. In this thesis, a distributed error correction circuit called Self-Healing Cellular Automata (SHCA) that can repair itself is presented. A possible way to deploy a SHCA in a system of SRAM-based embedded program memories (ePM) for one type of chip multi-processors is also discussed. The SHCA is compared with conventional error correction approaches and its strengths and limitations are analyzed. self-healing cellular automata soft errors defective program memories Data Storage Systems
34	Using Duplication with Compare for On-line Error Detection in FPGA-based Designs McMurtrey, Daniel L. 06 December 2006 (has links) (PDF) Space destined FPGA-based systems must employ redundancy techniques to account for the effects of upsets caused by radiated environments. Error detection techniques can be used to alert external systems to the presence of these upsets. Readback with compare is an error detection technique commonly employed in FPGA-based designs. This work introduces duplication with compare (DWC) as an automated on-line error detection technique that can be used as an alternative to readback with compare. This work also introduces a set of metrics that is used to quantify the effectiveness and coverage of this error detection technique. A tool is presented that automatically inserts duplication with compare into a user's design. Duplication with compare is shown to correctly detect over 99.9% of errors caused by configuration upsets at a hardware cost of approximately 2X. System designers can apply duplication with compare to designs using this tool to increase the reliability and availability of their systems while minimizing resource usage and power. error detection FPGA SEU reliability FPGA reliability soft errors fault tolerance Electrical and Computer Engineering
35	Design and Analysis Methodologies to Reduce Soft Errors in nanometer VLSI Circuits Gill, Balkaran S. January 2006 (has links) No description available. nanometer SEU SMU Soft errors Soft Delay BICS Clock jitters race radiation neutron alpha-particles
36	Early evaluation of multicore systems soft error reliability using virtual platforms / Avaliação de sistema de larga escala sob à influência de falhas temporárias durante a exploração de inicial projetos através do uso de plataformas virtuais Rosa, Felipe Rocha da January 2018 (has links) A crescente capacidade de computação dos componentes multiprocessados como processadores e unidades de processamento gráfico oferecem novas oportunidades para os campos de pesquisa relacionados computação embarcada e de alto desempenho (do inglês, high-performance computing). A crescente capacidade de computação progressivamente dos sistemas baseados em multicores permite executar eficientemente aplicações complexas com menor consumo de energia em comparação com soluções tradicionais de núcleo único. Essa eficiência e a crescente complexidade das cargas de trabalho das aplicações incentivam a indústria a integrar mais e mais componentes de processamento no mesmo sistema. O número de componentes de processamento empregados em sistemas grande escala já ultrapassa um milhão de núcleos, enquanto as plataformas embarcadas de 1000 núcleos estão disponíveis comercialmente. Além do enorme número de núcleos, a crescente capacidade de processamento, bem como o número de elementos de memória interna (por exemplo, registradores, memória RAM) inerentes às arquiteturas de processadores emergentes, está tornando os sistemas em grande escala mais vulneráveis a erros transientes e permanentes. Além disso, para atender aos novos requisitos de desempenho e energia, os processadores geralmente executam com frequências de relógio agressivos e múltiplos domínios de tensão, aumentando sua susceptibilidade à erros transientes, como os causados por efeitos de radiação. A ocorrência de erros transientes pode causar falhas críticas no comportamento do sistema, o que pode acarretar em perdas de vidas financeiras ou humanas. Embora tenha sido observada uma taxa de 280 erros transientes por dia durante o voo de uma nave espacial, os sistemas de processamento que trabalham à nível do solo devem experimentar pelo menos um erro transiente por dia em um futuro próximo. A susceptibilidade crescente de sistemas multicore à erros transientes necessariamente exige novas ferramentas para avaliar a resiliência à erro transientes de componentes multiprocessados em conjunto com pilhas complexas de software (sistema operacional, drivers) durante o início da fase de projeto. O objetivo principal abordado por esta Tese é desenvolver um conjunto de técnicas de injeção de falhas, que formam uma ferramenta de injeção de falha. O segundo objetivo desta Tese é estabelecer as bases para novas disciplinas de gerenciamento de confiabilidade considerando erro transientes em sistemas emergentes multi/manycore utilizando aprendizado de máquina. Este trabalho identifica multiplicas técnicas que podem ser usadas para fornecer diferentes níveis de confiabilidade na carga de trabalho e na criticidade do aplicativo. / The increasing computing capacity of multicore components like processors and graphics processing unit (GPUs) offer new opportunities for embedded and high-performance computing (HPC) domains. The progressively growing computing capacity of multicore-based systems enables to efficiently perform complex application workloads at a lower power consumption compared to traditional single-core solutions. Such efficiency and the ever-increasing complexity of application workloads encourage industry to integrate more and more computing components into the same system. The number of computing components employed in large-scale HPC systems already exceeds a million cores, while 1000-cores on-chip platforms are available in the embedded community. Beyond the massive number of cores, the increasing computing capacity, as well as the number of internal memory cells (e.g., registers, internal memory) inherent to emerging processor architectures, is making large-scale systems more vulnerable to both hard and soft errors. Moreover, to meet emerging performance and power requirements, the underlying processors usually run in aggressive clock frequencies and multiple voltage domains, increasing their susceptibility to soft errors, such as the ones caused by radiation effects. The occurrence of soft errors or Single Event Effects (SEEs) may cause critical failures in system behavior, which may lead to financial or human life losses. While a rate of 280 soft errors per day has been observed during the flight of a spacecraft, electronic computing systems working at ground level are expected to experience at least one soft error per day in near future. The increased susceptibility of multicore systems to SEEs necessarily calls for novel cost-effective tools to assess the soft error resilience of underlying multicore components with complex software stacks (operating system-OS, drivers) early in the design phase. The primary goal addressed by this Thesis is to describe the proposal and development of a fault injection framework using state-of-the-art virtual platforms, propose set of novel fault injection techniques to direct the fault campaigns according to with the software stack characteristics, and an extensive framework validation with over a million of simulation hours. The second goal of this Thesis is to set the foundations for a new discipline in soft error reliability management for emerging multi/manycore systems using machine learning techniques. It will identify and propose techniques that can be used to provide different levels of reliability on the application workload and criticality. Microeletrônica Tolerancia : Falhas Aprendizado : máquina Multi/Manycore Systems Machine Learning Soft Errors ARM Simulation Virtual Platforms Reliability Fault Tolerance
37	Laser as a Tool to Study Radiation Effects in CMOS Ajdari, Bahar 01 August 2017 (has links) Energetic particles from cosmic ray or terrestrial sources can strike sensitive areas of CMOS devices and cause soft errors. Understanding the effects of such interactions is crucial as the device technology advances, and chip reliability has become more important than ever. Particle accelerator testing has been the standard method to characterize the sensitivity of chips to single event upsets (SEUs). However, because of their costs and availability limitations, other techniques have been explored. Pulsed laser has been a successful tool for characterization of SEU behavior, but to this day, laser has not been recognized as a comparable method to beam testing. In this thesis, I propose a methodology of correlating laser soft error rate (SER) to particle beam gathered data. Additionally, results are presented showing a temperature dependence of SER and the "neighbor effect" phenomenon where due to the close proximity of devices a "weakening effect" in the ON state can be observed. Soft errors (Computer science) Complementary metal oxide semiconductors
38	Enhancing Value-Based Healthcare with Reconstructability Analysis: Predicting Risk for Hip and Knee Replacements Froemke, Cecily Corrine 08 August 2017 (has links) Legislative reforms aimed at slowing growth of US healthcare costs are focused on achieving greater value, defined specifically as health outcomes achieved per dollar spent. To increase value while payments are diminishing and tied to individual outcomes, healthcare must improve at predicting risks and outcomes. One way to improve predictions is through better modeling methods. Current models are predominantly based on logistic regression (LR). This project applied Reconstructability Analysis (RA) to data on hip and knee replacement surgery, and considered whether RA could create useful models of outcomes, and whether these models could produce predictions complimentary to or even stronger than LR models. RA is a data mining method that searches for relations in data, especially non-linear and higher ordinality relations, by decomposing the frequency distribution of the data into projections, several of which taken together define a model, which is then assessed for statistical significance. The predictive power of the model is expressed as the percent reduction of uncertainty (Shannon entropy) of the dependent variable (the DV) gained by knowing the values of the predictive independent variables (the IVs). Results showed that LR and RA gave the same results for equivalent models, and showed that exploratory RA provided better models than LR. Sixteen RA predictive models were then generated across the four DVs: complications, skilled nursing discharge, readmissions, and total cost. While the first three DVs are nominal, RA generated continuous predictions for cost by calculating expected values. Models included novel comorbidity variables and non-hypothesized interaction terms, and often resulted in substantial reductions in uncertainty. Predictive variables consisted of both delivery system variables and binary patient comorbidity variables. Complications were predicted by the total number of patient comorbidities. Skilled nursing discharges were predicted both by patient-related factors and delivery system variables (location, surgeon volume), suggesting practice patterns influence utilization of skilled nursing facilities. Readmissions were not well predicted, suggesting the data used in this project lacks the right variables or that readmissions are simply unpredictable. Delivery system variables (surgeon, location, and surgeon volume) were found to be the predominant predictors of total cost. Risk ratios were generated as an additional measure of effect size. These risk ratios were used to classify the IV states of the models as indicating higher or lower risk of adverse outcomes. Some IV states showed nearly 25% of patients at increased risk, while other IV states showed over 75% of patients at decreased risk. In real time, such risk predictions could support clinical decision making and custom-tailored utilization of services. Future research might address the limitations of this project's data and employ additional RA techniques and training-test splits. Implementation of predictive models is also discussed, with considerations for data supply lines, maintenance of models, organizational buy-in, and the acceptance of model output by clinical teams for use in real-time clinical practice. If outcomes and risk are adequately predicted, areas for potential improvement become clearer, and focused changes can be made to drive improvements in patient care. Better predictions, such as those resulting from the RA methodology, can thus support improvement in value--better outcomes at a lower cost. As reimbursement increasingly evolves into value-based programs, understanding the outcomes achieved, and customizing patient care to reduce unnecessary costs while improving outcomes, will be an active area for clinicians, healthcare administrators, researchers, and data scientists for many years to come. Soft errors (Computer science) Complementary metal oxide semiconductors Medicine and Health Sciences Public Health Rehabilitation and Therapy Surgery
39	Scheduling and Optimization of Fault-Tolerant Embedded Systems Izosimov, Viacheslav January 2006 (has links) <p>Safety-critical applications have to function correctly even in presence of faults. This thesis deals with techniques for tolerating effects of transient and intermittent faults. Reexecution, software replication, and rollback recovery with checkpointing are used to provide the required level of fault tolerance. These techniques are considered in the context of distributed real-time systems with non-preemptive static cyclic scheduling.</p><p>Safety-critical applications have strict time and cost constrains, which means that not only faults have to be tolerated but also the constraints should be satisfied. Hence, efficient system design approaches with consideration of fault tolerance are required.</p><p>The thesis proposes several design optimization strategies and scheduling techniques that take fault tolerance into account. The design optimization tasks addressed include, among others, process mapping, fault tolerance policy assignment, and checkpoint distribution.</p><p>Dedicated scheduling techniques and mapping optimization strategies are also proposed to handle customized transparency requirements associated with processes and messages. By providing fault containment, transparency can, potentially, improve testability and debugability of fault-tolerant applications.</p><p>The efficiency of the proposed scheduling techniques and design optimization strategies is evaluated with extensive experiments conducted on a number of synthetic applications and a real-life example. The experimental results show that considering fault tolerance during system-level design optimization is essential when designing cost-effective fault-tolerant embedded systems.</p> Embedded systems Real-Time systems Design optimization Fault tolerance Transient faults Soft errors Computer and systems science Data- och systemvetenskap
40	Analysis and Design of Resilient VLSI Circuits Garg, Rajesh 2009 May 1900 (has links) The reliable operation of Integrated Circuits (ICs) has become increasingly difficult to achieve in the deep sub-micron (DSM) era. With continuously decreasing device feature sizes, combined with lower supply voltages and higher operating frequencies, the noise immunity of VLSI circuits is decreasing alarmingly. Thus, VLSI circuits are becoming more vulnerable to noise effects such as crosstalk, power supply variations and radiation-induced soft errors. Among these noise sources, soft errors (or error caused by radiation particle strikes) have become an increasingly troublesome issue for memory arrays as well as combinational logic circuits. Also, in the DSM era, process variations are increasing at an alarming rate, making it more difficult to design reliable VLSI circuits. Hence, it is important to efficiently design robust VLSI circuits that are resilient to radiation particle strikes and process variations. The work presented in this dissertation presents several analysis and design techniques with the goal of realizing VLSI circuits which are tolerant to radiation particle strikes and process variations. This dissertation consists of two parts. The first part proposes four analysis and two design approaches to address radiation particle strikes. The analysis techniques for the radiation particle strikes include: an approach to analytically determine the pulse width and the pulse shape of a radiation induced voltage glitch in combinational circuits, a technique to model the dynamic stability of SRAMs, and a 3D device-level analysis of the radiation tolerance of voltage scaled circuits. Experimental results demonstrate that the proposed techniques for analyzing radiation particle strikes in combinational circuits and SRAMs are fast and accurate compared to SPICE. Therefore, these analysis approaches can be easily integrated in a VLSI design flow to analyze the radiation tolerance of such circuits, and harden them early in the design flow. From 3D device-level analysis of the radiation tolerance of voltage scaled circuits, several non-intuitive observations are made and correspondingly, a set of guidelines are proposed, which are important to consider to realize radiation hardened circuits. Two circuit level hardening approaches are also presented to harden combinational circuits against a radiation particle strike. These hardening approaches significantly improve the tolerance of combinational circuits against low and very high energy radiation particle strikes respectively, with modest area and delay overheads. The second part of this dissertation addresses process variations. A technique is developed to perform sensitizable statistical timing analysis of a circuit, and thereby improve the accuracy of timing analysis under process variations. Experimental results demonstrate that this technique is able to significantly reduce the pessimism due to two sources of inaccuracy which plague current statistical static timing analysis (SSTA) tools. Two design approaches are also proposed to improve the process variation tolerance of combinational circuits and voltage level shifters (which are used in circuits with multiple interacting power supply domains), respectively. The variation tolerant design approach for combinational circuits significantly improves the resilience of these circuits to random process variations, with a reduction in the worst case delay and low area penalty. The proposed voltage level shifter is faster, requires lower dynamic power and area, has lower leakage currents, and is more tolerant to process variations, compared to the best known previous approach. In summary, this dissertation presents several analysis and design techniques which significantly augment the existing work in the area of resilient VLSI circuit design.

Search results