Global ETD Search

141	Investigating techniques to reduce soft error rate under single-event-induced charge sharing / Investigando técnicas para reduzir a taxa de erro de soft sob evento único induzido de carga compartilhada Almeida, Antonio Felipe Costa de January 2014 (has links) The interaction of radiation with integrated circuits can provoke transient faults due to the deposit of charge in sensitive nodes of transistors. Because of the decrease the size in the process technology, charge sharing between transistors placed close to each other has been more and more observed. This phenomenon can lead to multiple transient faults. Therefore, it is important to analyze the effect of multiple transient faults in integrated circuits and investigate mitigation techniques able to cope with multiple faults. This work investigates the effect known as single-event-induced charge sharing in integrated circuits. Two main techniques are analyzed to cope with this effect. First, a placement constraint methodology is proposed. This technique uses placement constraints in standard cell based circuits. The objective is to achieve a layout for which the Soft-Error Rate (SER) due charge shared at adjacent cell is reduced. A set of fault injection was performed and the results show that the SER can be minimized due to single-event-induced charge sharing in according to the layout structure. Results show that by using placement constraint, it is possible to reduce the error rate from 12.85% to 10.63% due double faults. Second, Triple Modular Redundancy (TMR) schemes with different levels of granularities limited by majority voters are analyzed under multiple faults. The TMR versions are implemented using a standard design flow based on a traditional commercial standard cell library. An extensive fault injection campaign is then performed in order to verify the softerror rate due to single-event-induced charge sharing in multiple nodes. Results show that the proposed methodology becomes crucial to find the best trade-off in area, performance and soft-error rate when TMR designs are considered under multiple upsets. Results have been evaluated in a case-study circuit Advanced Encryption Standard (AES), synthesized to 90nm Application Specific Integrated Circuit (ASIC) library, and they show that combining the two techniques, the error rate resulted from multiple faults can be minimized or masked. By using TMR with different granularities and placement constraint methodology, it is possible to reduce the error rate from 11.06% to 0.00% for double faults. A detailed study of triple, four and five multiple faults combining both techniques are also described. We also tested the TMR with different granularities in SRAM-based FPGA platform. Results show that the versions with a fine grain scheme (FGTMR) were more effectiveness in masking multiple faults, similarly to results observed in the ASICs. In summary, the main contribution of this master thesis is the investigation of charge sharing effects in ASICs and the use of a combination of techniques based on TMR redundancy and placement to improve the tolerance under multiple faults. Microeletrônica Tolerancia : Falhas Fault tolerance Placement Constraining Single-Event-Induced Charge Sharing Triple Modular Redundancy
142	The effects of the compiler optimizations in embedded processors reliability Lins, Filipe Maciel January 2017 (has links) O recente avanço tecnológico dos processadores embarcados aumentou a complexidade dos compiladores e o uso de recursos heterogêneos, como Arranjo de Portas Programáveis em Campo (Field Programmable Gate Array - FPGA) e Unidade de Processamento Gráfico (Graphics Processing Unit - GPU), integrado aos processadores. Além disso, aumentou-se o uso de componentes de prateleira (Commercial off-the-shelf - COTS) em aplicações críticas, ao invés de chips tolerantes a radiação, pois os COTS podem ser mais baratos, flexíveis, terem uma rápida colocação no mercado e um menor consumo de energia. No entanto, mesmo com essas vantagens, os COTS são suscetíveis a falha sendo necessário garantir uma alta confiabilidade nos sistemas utilizados. Assim como, no caso de aplicações em tempo real, também se precisa respeitar os requisitos determinísticos. Como caso de estudo, este trabalho utiliza a Zynq que é um dispositivo COTS do tipo Sistema em Chip Totalmente Programável (All Programmable System on Chip - APSoC) no qual possui um processador ARM Cortex-A9 embarcado. Nesta pesquisa, investigou-se o impacto das falhas que afetam o arquivo de registradores na confiabilidade dos processadores embarcados. Para tanto, experimentos de injeção de falhas e de radiação de íons pesados foram realizados. Além do mais, avaliou-se como os diferentes níveis de otimização do compilador modificam o uso e a probabilidade de falha do arquivo de registradores do processador. Selecionou-se seis benchmarks representativos, cada um compilado com três níveis diferentes de otimização. Realizamos campanhas exaustivas de injeção de falhas para medir o Fator de Vulnerabilidade Arquitetural (Architectural Vulnerability Factor - AVF) de cada código e configuração, identificando os registradores que são mais propensos a gerar uma corrupção de dados silenciosos (Silent Data Corruption - SDC) ou uma interrupção funcional de evento único (Single Event Functional Interruption - SEFI). Também foram correlacionadas as variações de confiabilidade observadas com a utilização do arquivo de registradores. Finalmente, irradiamos com íons pesados dois dos benchmarks selecionados compilados com dois níveis de otimização. Os resultados mostram que mesmo com o melhor desempenho, o menor uso do arquivo de registradores ou o menor AVF não é garantido que as aplicações irão alcançar a maior Carga de Trabalho Média Entre Falhas (Mean Workload Between Failure - MWBF). Por exemplo, os resultados mostram que o melhor desempenho da aplicação Multiplicação de Matrizes (Matrix Multiplication - MxM) é alcançado no nível de otimização mais alta. No entanto, nos resultados dos experimentos de injeção de falhas, a maior confiabilidade é alcançada no menor nível de otimização que possuem os menores AVFs e o menor uso do arquivo de registradores. Os resultados também mostram que o impacto das otimizações está fortemente relacionado com o algoritmo executado e como o compilador faz esta otimização. / The recent advances in the embedded processors increase the compilers complexity, and the usage of heterogeneous resources such as Field Programmable Gate Array (FPGA) and Graphics Processing Unit (GPU) integrated with the processors. Additionally, the increase in the usage of Commercial off-the-shelf (COTS) instead of radiation hardened chips in safety critical applications occurs because the COTS can be more flexible, inexpensive, have a fast time-to market and a lower power consumption. However, even with these advantages, it is still necessary to guarantee a high reliability in a system that uses a COTS for safety critical applications because they are susceptible to failures. Additionally, in the case of real time applications, the time requirements also need to be respected. As a case of study, this work uses the Zynq which is a COTS device classified as an All Programmable System-on-Chip (APSOC) and has an ARM Cortex-A9 as the embedded processor. In this research, the impact of faults that affect the register file in the embedded processors reliability was investigated. For that, fault-injection and heavy-ion radiation experiments were performed. Moreover, an evaluation of how the different levels of compiler optimization modify the usage and the failure probability of a processor register file. A set of six representative benchmarks, each one compiled with three different levels of compiler optimization. Exhaustive fault injection campaigns were performed to measure the registers Architectural Vulnerability Factor (AVF) of each code and configuration, identifying the registers that are more likely to generate Silent Data Corruption (SDC) or Single Event Functional Interruption (SEFI). Moreover, the observed reliability variations with register file utilization were correlated. Finally, two of the selected benchmarks, each one compiled with two different levels of optimization were irradiated in the heavy ions experiments. The results show that the best performance, the minor register file usage, or the lowest AVF does not always bring the highest Mean Workload Between Failures (MWBF). As an example, in the Matrix Multiplication (MxM) application, the best performance is achieved in the highest compiler optimization. However, in the fault injection, the higher reliability is obtained in the lower compiler optimization which has, the lower AVFs and the lower register file usage. Results also show that the impact of optimizations is strongly related to the executed algorithm and how the compiler optimizes them. Microeletrônica Processadores Tolerancia : Falhas Fault Injector AVF MWBF APSoC COTS Soft Errors Reliability Processor Optimization
143	A platform to evaluate the fault sensitivity of superscalar processors Tonetto, Rafael Billig January 2017 (has links) A diminuição agressiva dos transistores, a qual levou a reduções na tensão de operação, vem proporcionando enormes benefícios em termos de poder computacional, mantendo o consumo de energia em um nível aceitável. No entanto, à medida que o tamanho dos recursos e a tensão diminuem, a susceptibilidade a falhas tende a aumentar e a importância das avaliações com falhas cresce. Os processadores superescalares, que hoje dominam o mercado, são um exemplo significativo de sistemas que se beneficiam destas melhorias tecnológicas e são mais suscetíveis a erros. Juntamente com isso, existem vários métodos para injeção de falhas, que é um meio eficiente para avaliar a resiliência desses processadores. No entanto, os métodos tradicionais de injeção de falhas, como a técnica baseada em hardware, impõem que o processador seja implementado fisicamente antes que os testes possam ser conduzidos, sem fornecer níveis razoáveis de controlabilidade. Por outro lado, as técnicas baseadas em simuladores implementados em software oferecem altos níveis de controlabilidade. No entanto, enquanto os simuladores em SW de alto nível (que são rápidos) podem levar a uma avaliação incompleta, ou mesmo equivocada, da resiliência do sistema, uma vez que não modelam os componentes internos do hardware (como os registradores do pipeline), simuladores em SW de baixo nível são extremamente lentos e dificilmente estão disponíveis em RTL (Register-Transfer Level). Considerando este cenário, propomos uma plataforma que preenche a lacuna entre as abordagens em HW e SW para avaliar falhas em processadores superescalares: é rápida, tem alta controlabilidade, disponível em software, flexível e, o mais importante, modela o processador em RTL. A ferramenta foi implementada sobre a plataforma usada para gerar o processador superescalar The Berkeley Out-of-Order Machine (BOOM), que é um processador altamente escalável e parametrizável. Esta propriedade nos permitiu experimentar três arquiteturas diferentes do processador: single-, dual- e quad-issue, e, ao analisar como a resiliência a falhas é influenciada pela complexidade de diferentes processadores, usamos os processadores para validar nossa ferramenta. Tolerancia : Falhas Processamento paralelo Fault injection Register-transfer level Superscalar processor
144	Automated design flow for applying triple modular redundancy in complex semi-custom digital integrated circuits / Fluxo de projeto automatizado para aplicar redundância modular tripla em circuitos semicustomizados complexos Benites, Luis Alberto Contreras January 2018 (has links) Os efeitos de radiação têm sido um dos problemas mais sérios em aplicações militares e espaciais. Mas eles também são uma preocupação crescente em tecnologias modernas, mesmo para aplicações comerciais no nível do solo. A proteção dos circuitos integrados contra os efeitos da radiação podem ser obtidos através do uso de processos de fabricação aprimorados e de estratégias em diferentes estágios do projeto do circuito. A técnica de TMR é bem conhecida e amplamente empregada para mascarar falhas únicas sem detectálas. No entanto, o projeto de circuitos TMR não é automatizado por ferramentas EDA comerciais e até mesmo eles podem remover parcial ou totalmente a lógica redundante. Por outro lado, existem várias ferramentas que podem ser usadas para implementar a técnica de TMR em circuitos integrados, embora a maioria delas sejam ferramentas comerciais licenciadas, convenientes apenas para dispositivos específicos, ou com uso restrito por causa do regime ITAR. O presente trabalho pretende superar esses incovenientes, para isso uma metodologia é proposta para automatizar o projeto de circuitos TMR utilizando um fluxo de projeto comercial. A abordagem proposta utiliza um netlist estruturado para implementar automaticamente os circuitos TMR em diferentes níveis de granularidade de redundância para projetos baseados em células e FPGA. A otimização do circuito TMR resultante também é aplicada com base na abordagem do dimensionamento de portas lógicas. Além disso, a verificação do circuito TMR implementado é baseada na verificação de equivalência e garante sua funcionalidade correta e sua capacidade de tolerancia a falhas simples. Experimentos com um circuito derivado de HLS e uma descrição ofuscada do soft-core ARM Cortex-M0 foram realizados para mostrar o uso e as vantagens do fluxo de projeto proposto. Diversas questões relacionadas à remoção da lógica redundante implementada foram encontradas, bem como o impacto no incremento de área causado pelos votadores de maioria. Além disso, a confiabilidade de diferentes implementações de TMR do soft core ARM sintetizado em FPGA foi avaliada usando campanhas de injeção de falhas emuladas. Como resultado, foi reforçado o nível de alta confiabilidade da implemntação com mais fina granularidade, mesmo na presença de até 10 falhas acumuladas, e a menor capacidade de mitigação correspondente à replicação de flip-flops apenas. / Radiation effects have been one of the most serious issues in military and space applications. But they are also an increasing concern in modern technologies, even for commercial applications at the ground level. Protection or hardening of integrated circuits against radiation effects can be obtained through the use of enhanced fabrication processes and strategies at different stages of the circuit design. The triple modular redundancy (TMR) technique is a widely and well-known technique employed to mask single faults without detecting them. However, the design of TMR circuits is not automated by commercial electronic design automation (EDA) tools and even they can remove partially or totally the redundant logic. On the other hand, there are several tools that can be used to implement the TMR technique in integrated circuits, although most of them are licensed commercial tools, convenient only for specific devices, or with restricted use because of the International Traffic in Arms Regulations (ITAR) regimen. The present work intends to overcome these issues so a methodology is proposed to automate the design of TMR circuits using a commercial design flow. The proposed approach uses a structured netlist to implement automatically TMR circuits at different granularity levels of redundancy for cell-based and field-programmable gate array (FPGA) designs. Optimization of the resulting TMR circuit is also applied based on the gate sizing approach. Moreover, verification of the implemented TMR circuit is based on equivalence checking, and guarantee its correct functionality and its fault-tolerant capability against soft errors. Experiments with an high-level synthesis (HLS)-derived circuit and an obfuscated description of the ARM Cortex-M0 soft-core are performed to show the use and the advantages of the proposed design flow. Several issues related to the removal of the implemented redundant logic were found as well as the impact in the increment of area caused by the majority voters. Furthermore, the reliability of different TMR implementations of the ARM soft-core synthesized in FPGA was evaluated using emulated-simulation fault injection campaigns. As a result, it was reinforced the high-reliability level of the finest granularity implementation even in the presence of up to 10 accumulated faults and the poorest mitigation capacity corresponding to the replication of flip-flops solely. Microeletrônica Tolerancia : Falhas Fault tolerance FPGA ASIC Semicustom design flow Equivalence checking TMR Radiation effects
145	Designing and evaluating hybrid techniques to detect transient faults in processors embedded in FPGAs / Desenvolvendo e Avaliando técnicas híbridas para detectar falhas transientes em processadores embarcados em FPGAs / Entwurf und auswertung von hybrid-techniken zur erkennung von transienten fehlern in FPGA eingebetteten prozessoren Azambuja, José Rodrigo Furlanetto de January 2013 (has links) Der aktuelle Stand der Technologie bringt schnellere und kleinere Bausteine für die Herstellung von integrierten Schaltungen mit sich, die während sie effizienter sind auch anfälliger für Strahlung werden. Kleinere Abmessungen der Transistoren, höhere Integrationsdichte, geringere Versorgungsspannungen und höhere Betriebsfrequenzen sind einige der Charakteristika, die energiegeladene Partikel zu einer Herausforderung machen, wenn man integrierte Schaltungen in rauen Umgebungen einsetzt. Diese Art der Partikel hat einen sehr großen Einfluss auf Prozessoren, die in einer solchen Umgebung eingesetzt werden. Sowohl die Ausführung des Programms, welche durch fehlerhafte Sprünge in der Programmsequenz beeinflusst wird, als auch Daten, die in speichernden Elementen wie Programmspeicher, Datenspeicher oder in Registern abgelegt sind, werden verfälscht. Um solche Prozessorsysteme abzusichern, wird in der Literatur Fehlertoleranz empfohlen, welche die Systemperformanz verringert, einen größeren Flächenverbrauch mit sich bringt und das System dennoch nicht komplett schützen kann. Diese Fehlertoleranz kann sowohl durch software- als auch durch hardwarebasierte Ansätze umgesetzt werden. In diesem Zusammenhang schlagen wir eine Kombination aus Hardware- und Software- Lösung vor, welche die Systemperformanz nur sehr wenig beeinflusst und den zusätzlichen Speicheraufwand minimiert. Diese Hybrid-Technologie zielt darauf ab, alle Fehler in einem System zu finden. Fünf solcher Techniken werden beschrieben und erklärt, zwei der vorgestellten Techniken sind bekannte Software-Lösungen, die anderen drei sind neue Hybrid-Lösungen, um alle transienten Effekte von Strahlung in Prozessoren erkennen zu können. Diese unterschiedlichen Ansätze werden anhand ihrer Ausführungszeit, Programm-, Datenspeicher, Flächenvergrößerung und Taktfrequenz analysiert und ausgewertet. Um die Effizienz und die Machbarkeit des vorgeschlagenen Ansatzes verifizieren zu können, werden Fehlerinjektionstests sowohl durch Simulation als auch durch Bestrahlungsexperimente in unterschiedlichen Positionen mit einer Cobalt-60 Quelle durchgeführt. Die Ergebnisse des vorgeschlagenen Ansatzes verbessern den Stand der Technik durch die Bereitstellung einer höheren Fehlererkennungsrate bei sehr geringer negativer Beeinflussung der Performanz und des Speicherverbrauchs. / Os recentes avanços tecnológicos proporcionaram dispositivos menores e mais rápidos para a fabricação de circuitos que, apesar de mais eficientes, se tornaram mais sensíveis aos efeitos de radiação. Menores dimensões de transistores, mais densidade de integração, tensões de alimentação mais baixas e frequências de operação mais altas são algumas das características que tornaram partículas energizadas um problema, quando lidando com sistemas integrados em ambientes severos. Estes tipos de partículas tem uma grande influencia em processadores funcionando em tais ambientes, afetando tanto o fluxo de execução do programa ao causar desvios incorretos, bem como os dados armazenados em elementos de memória, como memórias de dados e programas e registradores. A fim de proteger sistemas processados, técnicas de tolerância a falhas foram propostas na literatura usando propostas baseadas em hardware, software, que diminuem o desempenho do sistema, aumentam a sua área e não são capazes de proteger totalmente o sistema destes efeitos. Neste contexto, propomos a combinação de técnicas baseadas em hardware e software para criar técnicas híbridas orientadas a detectar todas as falhas que afetam o sistema, com baixa degradação de desempenho e aumento de memória. Cinco técnicas são apresentadas e descritas em detalhes, das quais duas são conhecidas técnicas baseadas puramente em software e três são técnicas híbridas novas, para detectar todos os tipos de efeitos transientes causados pela radiação em processadores. As técnicas são avaliadas de acordo com o aumento no tempo de execução, no uso das memórias de dados e programa e de área, e degradação da frequência de operação. Para verificar a eficiência e aplicabilidade das técnicas propostas, campanhas de injeção de falhas são realizadas ao se simular a injeção de falhas e realizar experimentos de irradiação em diferentes localidades com nêutron e fontes de Cobalto-60. Os resultados mostraram que as técnicas propostas aprimoraram o estado da arte ao fornecer altas taxas de detecção de falhas com baixas penalidades em degradação de desempenho e aumento de memória. / Recent technology advances have provided faster and smaller devices for manufacturing circuits that while more efficient have become more sensitive to the effects of radiation. Smaller transistor dimensions, higher density integration, lower voltage supplies and higher operating frequencies are some of the characteristics that make energized particles an issue when dealing with integrated circuits in harsh environments. These types of particles have a major influence in processors working in such environments, affecting both the program’s execution flow by causing incorrect jumps in the program, and the data stored in memory elements, such as data and program memories, and registers. In order to protect processor systems, fault tolerance techniques have been proposed in literature using hardware-based and software-based approaches, which decrease the system’s performance, increase its area, and are not able to fully protect the system against such effects. In this context, we proposed a combination of hardware- and software-based techniques to create hybrid techniques aimed at detecting all the faults affecting the system, at low performance degradation and memory overhead. Five techniques are presented and described in detail, from which two are known software-based only techniques and three are new hybrid techniques, to detect all kinds of transient effects caused by radiation in processors. The techniques are evaluated according to execution time, program and data memories, and area overhead and operating frequency degradation. To verify the effectiveness and the feasibility of the proposed techniques, fault injection campaigns are performed by injecting faults by simulation and performing irradiation experiments in different locations with neutrons and a Cobalt-60 sources. Results have shown that the proposed techniques improve the state-of-the-art by providing high fault detection rates at low penalties on performance degradation and memory overhead. Fehlertoleranz Strahlungseffekte Prozessoren Microeletrônica Tolerancia : Falhas Fpga Fault tolerance Radiation effects Processors
146	Timing vulnerability factor analysis in master-slave D flip-flops / Análise do fator de vulnerabilidade temporal em flip-flops mestre-escravo do tipo D Zimpeck, Alexandra Lackmann January 2016 (has links) O dimensionamento da tecnologia trouxe consequências indesejáveis para manter a taxa de crescimento exponencial e levanta questões importantes relacionadas com a confiabilidade e robustez dos sistemas eletrônicos. Atualmente, microprocessadores modernos de superpipeline normalmente contêm milhões de dispositivos com cargas nos nós cada vez menores. Esse fator faz com que os circuitos sejam mais sensíveis a variabilidade ambiental e aumenta a probabilidade de um erro transiente acontecer. Erros transientes em circuitos sequenciais ocorrem quando uma única partícula energizada deposita carga suficiente perto de uma região sensível. Flip-Flops mestreescravo são os circuitos sequencias mais utilizados em projeto VLSI para armazenamento de dados. Se um bit-flip ocorrer dentro deles, eles perdem a informação prévia armazenada e podem causar um funcionamento incorreto do sistema. A fim de proporcionar sistemas mais confiáveis que possam lidar com os efeitos da radiação, este trabalho analisa o Fator de Vulnerabilidade Temporal (Timing Vulnerability Factor - TVF) em algumas topologias de flip-flops mestre-escravo em estágios de pipeline sob diferentes condições de operação. A janela de tempo efetivo que o bit-flip ainda pode ser capturado pelo próximo estágio é definido com janela de vulnerabilidade (WOV). O TVF corresponde ao tempo que o flip-flop é vulnerável a erros transientes induzidos pela radiação de acordo com a WOV e a frequência de operação. A primeira etapa deste trabalho determina a dependência entre o TVF com a propagação de falhas até o próximo estágio através de uma lógica combinacional com diferentes atrasos de propagação e com diferentes modelos de tecnologia, incluindo também as versões de alto desempenho e baixo consumo. Todas as simulações foram feitas sob as condições normais pré-definidas nos arquivos de tecnologia. Como a variabilidade se manifesta com o aumento ou diminuição das especificações iniciais, onde o principal problema é a incerteza sobre o valor armazenado em circuitos sequenciais, a segunda etapa deste trabalho consiste em avaliar o impacto que os efeitos da variabilidade ambiental causam no TVF. Algumas simulações foram refeitas considerando variações na tensão de alimentação e na temperatura em diferentes topologias e configurações de flip-flops mestre-escravo. Para encontrar os melhores resultados, é necessário tentar diminuir os valores de TVF, pois isso significa que eles serão menos vulneráveis a bit-flips. Atrasos de propagação entre dois circuitos sequenciais e frequências de operação mais altas ajudam a reduzir o TVF. Além disso, estas informações podem ser facilmente integradas em ferramentas de EDA para ajudar a identificar os flip-flops mestre-escravo mais vulneráveis antes de mitigar ou substituí-los por aqueles tolerantes a radiação. / Technology scaling has brought undesirable issues to maintain the exponential growth rate and it raises important topics related to reliability and robustness of electronic systems. Currently, modern super pipelined microprocessors typically contain many millions of devices with ever decreasing load capacitances. This factor makes circuits more sensitive to environmental variations and it is increased the probability to induce a soft error. Soft errors in sequential circuits occur when a single energetic particle deposits enough charge near a sensitive node. Master-slave flip-flops are the most adopted sequential elements to work as registers in pipeline and finite state machines. If a bit-flip happens inside them, they lose the previous stored information and may cause an incorrect system operation. To provide reliable systems that can cope with radiation effects, this work analysis the Timing Vulnerability Factor (TVF) of some master-slave D flip-flops topologies in pipeline stages under different operating conditions. The effective time window, which the bit-flip can still be captured by the next stage, is defined as Window of Vulnerability (WOV). TVF corresponds to the time that a flip-flop is vulnerable to radiation-induced soft errors according to WOV and clock frequency. In the first step of this work, it is determined the dependence between the TVF with the fault propagation to the next stage through a combinational logic with different propagation delays and with different nanometer technological models, including also high performance and low power versions. All these simulations were made under the pre-defined nominal conditions in technology files. The variability manifests with an increase or decreases to initial specification, where the main problem is the uncertainty about the value stored in sequential. In this way, the second step of this work evaluates the impact that environmental variability effect causes in TVF. Some simulations were redone considering supply voltage and temperature variations in different master-slave D flip-flop topologies configurations. To achieve better results, it is necessary to try to decrease the TVF values to reduce the vulnerability to bit-flips. The propagation delay between two sequential elements and higher clock frequencies collaborates to reduce TVF values. Moreover, all the information can be easily integrated into Electronic Design Automation (EDA) tools to help identifying the most vulnerable master-slave flip-flops before mitigating or replacing them by radiation hardened ones. Microeletrônica Tolerancia : Falhas Microelectronics Soft errors Window of vulnerability Timing vulnerability factor Environmental variability
147	Electromigration aware cell design / Projeto de células considerando a eletromigração Posser, Gracieli January 2015 (has links) A Eletromigração (EM) nas interconexões de metal em um chip é um mecanismo crítico de falhas de confiabilidade em tecnologias de escala nanométrica. Os trabalhos na literatura que abordam os efeitos da EM geralmente estão preocupados com estes efeitos nas redes de distribuição de potência e nas interconexões entre as células. Este trabalho aborda o problema da EM em outro aspecto, no interior das células, e aborda especificamente o problema da eletromigração em interconexões de saída, Vdd e Vss dentro de uma célula padrão onde há poucos estudos na literatura que endereçam esse problema. Até onde sabe-se, há apenas dois trabalhos na literatura que falam sobre a EM no interior das células. (DOMAE; UEDA, 2001) encontrou buracos formados pela EM nas interconexões de um inversor CMOS e então propôs algumas ideias para reduzir a corrente nos segmentos de fio onde formaram-se buracos. O outro trabalho, (JAIN; JAIN, 2012), apenas cita que a EM no interior das células padrão deve ser verificada e a frequência segura das células em diferentes pontos de operação deve ser modelada. Nenhum trabalho da literatura analisou e/ou modelou os efeitos da EM nos sinais dentro das células. Desta forma, este é o primeiro trabalho a usar o posicionamento dos pinos para reduzir os efeitos da EM dentro das células. Nós modelamos a eletromigração no interior das células incorporando os efeitos de Joule heating e a divergência da corrente e este modelo é usado para analisar o tempo de vida de grandes circuitos integrados. Um algoritmo eficiente baseado em grafos é desenvolvido para acelerar a caracterização da EM no interior das células através do cálculos dos valores de corrente média e RMS. Os valores de corrente computados por esse algoritmo produzem um erro médio de 0.53% quando comparado com os valores dados por simulações SPICE. Um método para otimizar a posição dos pinos de saída, Vdd e Vss das células e consequentemente otimizar o tempo de vida do circuito usando pequenas modificações no leiaute é proposto. Para otimizar o TTF dos circuitos somente o arquivo LEF é alterado para evitar as posições de pino críticas, o leiaute da célula não é alterado. O tempo de vida do circuito pode ser melhorado em até 62.50% apenas evitando as posições de pino críticas da saída da célula, 78.54% e 89.89% evitando as posições críticas do pino de Vdd e Vss, respectivamente Quando as posições dos pinos de saída, Vdd e Vss são otimizadas juntas, o tempo de vida dos circuitos pode ser melhorado em até 80.95%. Além disso, nós também mostramos o maior e o menor tempo de vida sobre todos as posições candidatas de pinos para um conjunto de células, onde pode ser visto que o tempo de vida de uma célula pode ser melhorado em até 76 pelo posicionamento do pino de saída. Além disso, alguns exemplos são apresentados para explicar porque algumas células possuem uma melhora maior no TTF quando a posição do pino de saída é alterada. Mudanças para otimizar o leiaute das células são sugeridas para melhorar o tempo de vida das células que possuem uma melhora muito pequena no TTF através do posicionamento dos pinos. A nível de circuito, uma análise dos efeitos da EM é apresentada para as diferentes camadas de metal e para diferentes comprimentos de fios para os sinais (nets) que conectam as células. / Electromigration (EM) in on-chip metal interconnects is a critical reliability failure mechanism in nanometer-scale technologies. Usually works in the literature that address EM are concerned with power network EM and cell to cell interconnection EM. This work deals with another aspect of the EM problem, the cell-internal EM. This work specifically addresses the problem of electromigration on signal interconnects and on Vdd and Vss rails within a standard cell. Where there are few studies in the literature addressing this problem. To our best knowledge we just found two works in the literature that talk about the EM within a cell. (DOMAE; UEDA, 2001) found void formed due to electromigration in the interconnection portion in a CMOS inverter and then proposes some ideas to reduce the current through the wire segments where the voids were formed. The second work, (JAIN; JAIN, 2012), just cites that the standard-cell-internal-EM should be checked and the safe frequency of the cells at different operating points must be modeled. No previous work analyzed and/or modeled the EM effects on the signals inside the cells. In this way, our work is the first one to use the pin placement to reduce the EM effects inside of the cells. In this work, cell-internal EM is modeled incorporating Joule heating effects and current divergence and is used to analyze the lifetime of large benchmark circuits. An efficient graph-based algorithm is developed to speed up the characterization of cell-internal EM. This algorithm estimates the currents when the pin position is moved avoiding a new characterization for each pin position, producing an average error of just 0.53% compared to SPICE simulation. A method for optimizing the output, Vdd and Vss pin placement of the cells and consequently to optimize the circuit lifetime using minor layout modifications is proposed. To optimize the TTF of the circuits just the LEF file is changed avoiding the critical pin positions, the cell layout is not changed. The circuit lifetime could be improved up to 62.50% at the same area, delay, and power because changing the pin positions affects very marginally the routing. This lifetime improvement is achieved just avoiding the critical output pin positions of the cells, 78.54% avoiding the critical Vdd pin positions, 89.89% avoiding the critical Vss pin positions and up to 80.95% (from 1 year to 5.25 years) when output, Vdd, and Vss pin positions are all optimized simultaneously. We also show the largest and smallest lifetimes over all pin candidates for a set of cells, where the lifetime of a cell can be improved up to 76 by the output pin placement. Moreover, some examples are presented to explain why some cells have a larger TTF improvement when the output pin position is changed. Cell layout optimization changes are suggested to improve the lifetime of the cells that have a very small TTF improvement by pin placement. At circuit level, we present an analysis of the EM effects on different metal layers and different wire lengths for signal wires (nets) that connect cells. Microeletrônica Cmos Tolerancia : Falhas Electromigration Circuit lifetime Cell-level AC EM Physical design Microelectronics
148	Detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip / Detection and protection of susceptible basic blocks through systematic bit-flip analysis Rodrigues, Diego Gonçalves January 2015 (has links) Partículas radioativas, ao atingirem o hardware dos sistemas computacionais, podem resultar em comportamentos inesperados durante a execução de um software. Tais comportamentos inesperados podem persistir por toda a vida útil do sistema ou podem ter uma duração limitada. Nesse último caso, temos o que chamamos de falhas transientes. Falhas transientes podem fazer com que as instruções do programa executem em uma sequência incorreta, o que chamamos de erros de fluxo de controle (Control-flow errors - CFEs). Estudos mostram que entre 33% e 77% das falhas transientes que afetam o hardware se manifestam como erros de fluxo de controle, dependendo do tipo do processador. Se o sistema não realizar nenhuma verificação em tempo de execução, um erro de fluxo de controle pode não ser detectado, o que pode resultar em uma execução incorreta do programa. Sistemas projetados para aplicações de baixo custo voltados para sistemas embarcados, onde os custos e desempenho são os fatores principais, utilizam técnicas baseadas em software para aumentar a confiabilidade do sistema. As técnicas baseadas em software para detecção de CFEs são conhecidas como signature monitoring ou signature checking. Essas técnicas introduzem código extra em todos os blocos básicos do programa com a finalidade de detectar os CFEs. Esse código extra implica em overhead, que pode ter uma grande variação dependendo da técnica utilizada. Na tentativa de minimizar o overhead imposto pelas técnicas de detecção de CFEs, neste trabalho foi desenvolvida a técnica de detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip. O objetivo da técnica é detectar os blocos básicos suscetíveis do programa através da análise sistemática de single bit-flip e proteger apenas esses blocos básicos. A técnica foi avaliada em termos de sua taxa de cobertura de falhas e desempenho. Para avaliar a taxa de cobertura falhas foram realizadas várias campanhas de injeção de falhas nos programas da suíte de benchmarks Mibench. A avaliação de desempenho foi feita com base na quantidade de instruções de máquina executadas pelos benchmarks, comparando quantidade de instruções antes e depois da utilização da técnica detecção e proteção de blocos básicos suscetíveis. Os resultados dos experimentos mostram que é possível reduzir em até 27,93% a quantidade de blocos básicos protegidos e ao mesmo tempo manter uma alta taxa de cobertura de falhas. Porém, em termos de desempenho, o ganho não ficou na mesma proporção da quantidade de blocos básicos não protegidos, ficando abaixo do esperado. / Radioactive particles hitting the hardware of computer systems may result in unexpected behavior during software execution. Such unexpected behavior may persist for the lifetime of the system or may have a limited duration. In the latter case, we have what is called a transient fault. Transient faults may cause the program instructions to execute in an incorrect sequence. This incorrect sequence is called a control flow error (CFE). Research show that between 33% to 77% of transient faults manifest themselves as CFEs, depending on the type of the processor. If the system does not perform any verification at runtime, a control flow error may be not detected, which can result in incorrect program execution. Systems projected to low-cost embedded applications, where cost and performance are the main factors, use software based techniques to improve system reliability. Software based techniques to detect CFEs are known as signature monitoring or signature checking. These techniques insert extra code in each basic block of the program in order to detect CFEs. This extra code add an undesirable overhead in the program, which can have large variation depending on the technique used. In the attempt to minimize the overhead added by CFEs detection techniques, this work developed a technique of detection and protection of susceptible basic blocks through systematic bit-flip analysis. The purpose of this technique is to detect the susceptible basic blocks of the program through the systematic bit-flip analysis and to protect only these basic blocks. The technique was evaluated based on its fault coverage rate and performance. To evaluate the fault coverage rate a fault injection campaing was performed in the programs of the Mibench benchmark suite. The performance evaluation was based in the number of instruction executed by each benchmark, comparing the number of instructions before and after the use of the proposed technique. The experimental results show that is possible to reduce up to 27,93% the amount of protected basic blocks, while keeping a high faults coverage hate. However, in terms of performance, the gain was not in the same proportion, being lower than expected. Sistemas embarcados Tolerancia : Falhas Embedded systems Transient faults Fault tolerance Signature checking
149	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures Pereira, Mônica Magalhães January 2012 (has links) As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place. Tolerancia : Falhas Microeletrônica Cmos Reconfigurable architectures Fault tolerance Reliability analysis Scaling
150	Avaliação de atraso, consumo e proteção de somadores tolerantes a falhas / Evaluating delay, power and protection of fault tolerant adders Franck, Helen de Souza January 2011 (has links) Nos últimos anos, os sistemas integrados em silício (SOCs - Systems-on-Chip) têm se tornado menos imunes a ruído, em decorrência dos ajustes necessários na tecnologia CMOS (Complementary Metal-Oxide-Silicon) para garantir o funcionamento dos transistores com dimensões nanométricas. Dentre tais ajustes, a redução da tensão de alimentação e da tensão de limiar (threshold) tornam os SOCs mais suscetíveis a falhas transientes, principalmente aquelas provocadas pela colisão de partículas energéticas que provêm do espaço e encontram-se presentes na atmosfera terrestre. Quando uma partícula energética de alta energia colide com o dreno de um transistor que está desligado, ela perde energia e produz pares elétron-lacuna livres, resultando em uma trilha de ionização. A ionização pode gerar um pulso transiente de tensão que pode ser interpretado como uma mudança no sinal lógico. Em um circuito combinacional, o pulso pode propagar-se até ser armazenado em um elemento de memória. Tal fenômeno é denominado Single-Event Transient (SET). Como a tendência é que as dimensões dos dispositivos fabricados com tecnologia CMOS continuem reduzindo por mais alguns anos, a ocorrência de SETs em SOCs operando na superfície terrestre tende a aumentar, exigindo a adoção de técnicas de tolerância a falhas no projeto de SOCs. O presente trabalho tem por objetivo avaliar circuitos somadores tolerantes a falhas transientes encontrados na literatura. Duas arquiteturas de somadores foram escolhidas: Ripple Carry Adder (RCA) e Binary Signed Digit Adder (BSDA). O RCA foi escolhido por ser o tipo de somador de menor custo e por isso, amplamente utilizado em SOCs. Já o BSDA foi escolhido porque utiliza o sistema numérico de dígito binário com sinal (Binary Signed Digit – BSD). Por ser um sistema de representação redundante, o uso de BSD facilita a aplicação de técnicas de tolerância a falhas baseadas em redundância de informação. Os somadores protegidos avaliados foram projetados com as seguintes técnicas: Redundância Modular Tripla (Triple Modular Redundancy - TMR) e Recomputação com Entradas e Saídas Invertidas (RESI), no caso do RCA, e codificação 1 de 3 e verificação de paridade, no caso do BSDA. As 9 arquiteturas de somadores foram simuladas no nível elétrico usando o Modelo Tecnológico Preditivo (Predictive Technology Model - PTM) de 45nm e considerando quatro comprimentos de operandos: 4, 8, 16 e 32 bits. Os resultados obtidos permitiram quantificar o número de transistores, o atraso crítico e a potência média consumida por cada arquitetura protegida. Também foram realizadas campanhas de injeção de falhas, por meio de simulações no nível elétrico, para estimar o grau de proteção de cada arquitetura. Os resultados obtidos servem para guiar os projetistas de SOCs na escolha da arquitetura de somador tolerante a falhas mais adequada aos requisitos de cada projeto. / In the past recent years, integrated systems on a chip (Systems-on-chip - SOCs) became less immune to noise due to the adjusts in CMOS technology needed to assure the operation of nanometric transistors. Among such adjusts, the reductions in supply voltage and threshold voltage make SOSs more susceptible to transient faults, mainly those provoked by the collision of charged particles coming from the outer space that are present in the atmosphere. When a heavily energy charged particle hits the drain region of a transistor that is at the off state it produces free electron-hole pairs, resulting in an ionizing track. The ionization may generate a transient voltage pulse that can be interpreted as a change in the logic signal. In a combinational circuit, the pulse may propagate up to the primary outputs and may be captured by the output storage element. Such phenomenon is referred to as Single-Event Transient (SET). Since it is expected that transistor dimensions will continue to reduce in the next technological nodes, the occurrence of SETs at Earth surface will increase and therefore, fault tolerance techniques will become a must in the design of SOSs. The present work targets the evaluation of transient fault-tolerant adders found in the literature. Two adder architectures were chosen: the Ripple-Carry Adder (RCA) and the Binary Signed Digit Adder (BSDA). The RCA was chosen because it is the least expensive and therefore, the most used architecture for SOS design. The BSDA, in turn, was chosen because it uses the Binary Signed Digit (BSD) system. As a redundant number system, the BSD paves the way to the implementation of fault-tolerant adders using information redundancy. The evaluated fault-tolerant adders were implemented by using the following techniques: Triple Module Redundancy (TMR) and Recomputing with Inverted Inputs and Outputs (RESI), in the case of the RCA, and 1 out of 3 coding and parity verification, in the case of the BSDA. A total of 9 adder architectures were simulated at the electric-level using the Predictive Technology Model (PTM) for 45nm in four different bitwidths: 4, 8, 16 and 32. The obtained results allowed for quantifying the number of transistors, critical delay and average power consumption for each fault-tolerant architecture. Fault injection campaigns were also accomplished by means of electric-level simulations to estimate the degree of protection of each architecture. The results obtained in the present work may be used to guide SOS designers in the choice of the fault-tolerant adder architecture that is most likely to satisfy the design requirements. Microeletrônica Tolerancia : Falhas Fault tolerance SET (single-event transient) Binary-signed digit number system Adder architectures

Search results