Global ETD Search

41	Flexible Fault Tolerance for the Robot Operating System Marok, Sukhman S. 01 June 2020 (has links) The introduction of autonomous vehicles has the potential to reduce the number of accidents and save countless lives. These benefits can only be realized if autonomous vehicles can prove to be safer than human drivers. There is a large amount of active research around developing robust algorithms for all parts of the autonomous vehicle stack including sensing, localization, mapping, perception, prediction, planning, and control. Additionally, some of these research projects have involved the use of the Robot Operating System (ROS). However, another key aspect of realizing an autonomous vehicle is a fault-tolerant design that can ensure the safe operation of the vehicle under unfavorable conditions. The goal of this thesis is to evaluate the feasibility of adding a dedicated fault tolerance module into a ROS based architecture. The fault tolerance module is used to implement a safety controller that can take over safety-critical operations of the system when a fault is detected in the main computer. A Xilinx Zynq-7000 SoC with a dual-core ARM Cortex-A9 and an FPGA programmable logic region is chosen as the platform. The platform works in the Asymmetric Multiprocessing (AMP) configuration with a Linux based operating system on one core and a real-time operating system (RTOS) on the other. Results are gathered from an implementation done on a ROS based mobile robot platform. Robotics Fault Tolerance ROS RTOS FPGA Robotics
42	Scalable Byzantine State Machine Replication: Designs, Techniques, and Implementations Arun, Balaji 02 July 2021 (has links) State machine replication (SMR) is one of the most widely studied and used methodology for building highly available distributed applications and services. SMR replicates a service across a set of computing hosts, and executes client operations on the replicas in an agreed- upon total order, ensuring linearizability of the replicated shared state. The problem of determining a total order reduces to one of computing consensus. State-of-the-art consensus protocols are inadequate for newer classes of applications such as Blockchains and for geographically distributed infrastructures. The widely used Crash Fault Tolerance (CFT) fault model of consensus protocols is prone to malicious and adversarial behaviors as well as non-crash faults such as software bugs. The Byzantine fault-tolerance (BFT) model and its trust-based variant, the hybrid model, permit stronger failure adversaries. However, state-of-the-art Byzantine and hybrid consensus protocols have performance limitations in geographically distributed environments: they designate a primary replica for proposing total-orders, which becomes a bottleneck and yields sub-optimal latencies for faraway clients. Additionally, they do not scale to hundreds of replicas and provide consistent performance as the system size grows. To overcome these limitations and develop highly scalable SMR solutions, this dissertation presents two leaderless consensus protocols, namely ezBFT and Dester, for the Byzantine and hybrid models, respectively. These protocols enable every replica to receive and order client commands. Additionally, they exchange command dependencies to collectively order commands without relying on a primary. Our experimental evaluations in a 7-node geographically distributed setup reveals that ezBFT improves client-side latency by as much as 40% over state-of-the-art BFT protocols including PBFT, FaB, and Zyzzyva. Dester, for the hybrid model, reduces latency by as much as 30% over ezBFT. Next, the dissertation presents a new paradigm called DQBFT for designing consensus protocols that can scale to hundreds of nodes in geographically distributed environments. Since leaderless protocols exchange command dependencies, they do not scale to hundreds of nodes. DQBFT overcomes this scalability limitation by decentralizing only the heavy task of replicating commands and centralizing the process of ordering the commands. While DQBFT can be used to enhance existing primary-based protocols, Destiny is a hybrid instantiation of the DQBFT paradigm using linear communication for better scalability than naive instantiations. Experimental evaluations in a 193-node geographically distributed setup reveal that Destiny achieves ≈ 3× better throughput and ≈50% better latency than state-of-the-art BFT protocols including Hotstuff, SBFT, and Hybster. Lastly, the dissertation presents two techniques for designing and implementing BFT protocols with reduced development costs. The dissertation presents Bumblebee, a methodology for manually transforming CFT protocols to tolerate Byzantine faults using trusted execution environments that are increasingly available in commodity hardware. Bumblebee is based on the observation that CFT protocols are incapable of tolerating non-malicous non-crash faults, but they are nevertheless deployed in many production systems. Bumblebee provides a Generic Algorithm that can represent protocols in both CFT and hybrid fault models, thus allowing easy construction of hybrid protocols using CFT protocols as baselines. The dissertation constructs hybrid instantiations of CFT protocols including Paxos, Raft, and M2Paxos. Experimental evaluations of the hybrid variants reveal that they perform at par with native hybrid protocols, but incur a 30% overhead over their CFT counterparts. Hybrid protocols rely on the integrity of trusted execution environments, which are increasingly subject to security exploits. To withstand exploits, the dissertation presents DuoBFT, a protocol that exposes both the BFT and hybrid fault models within a single consensus protocol. This enables consensus under both fault models within the same protocol and without additional redundancy, allowing DuoBFT to achieve the performance of hybrid protocols and the security of BFT protocols. Experimental evaluations reveal that DuoBFT achieves the best of both hybrid and BFT fault models with less than 10% overhead. / Doctor of Philosophy / Computers are ubiquitous; they perform some of the most complex and safety-critical tasks such as controlling aircraft, managing the financial markets, and maintaining sensitive medical records. The undeniable fact is that computers are faulty. They are prone to crash and can behave arbitrarily. Even the most robust computers such as those that are sent to the outer space eventually fail. External phenomenon such as power outages and network disruptions affect their operation. To make computing systems reliable, researchers and practitioners have long focussed on interconnecting many individual computers and programming them to effectively be duplicates of one another. This way when one computer fails in a system, the rest of the computers still ensure that the system as a whole is operational. Duplication requires that multiple computers effectively perform the same task. In order for multiple computers to perform the same task together, they should first agree on the task. More generally, since computing systems perform multiple tasks, they should agree on the sequence of tasks that they will individually perform and follow the agreement. This is what is known as the State Machine Replication technique. State Machine Replication (SMR) is a powerful technique that is applicable to numerous computing applications. Blockchain systems, the technology behind the cryptocurrencies such as Bitcoin and Ethereum, uses the SMR technique. In the context of Blockchain, the added challenge in that some of the computers involved in SMR can be programmed by adversarial parties and could act in a way to jeopardize the integrity of the whole system. For Bitcoin and Ethereum, this could mean embezzlement of hundreds or even millions of dollars worth of cryptocurrencies. Certain SMR systems are capable of tolerating such intrusions and ensure system integrity. Such systems are deemed to be Byzantine tolerant. This dissertation presents designs, techniques, and implementations of Byzantine State Machine Replication systems. The problems addressed in this dissertation are those that plague existing Byzantine SMR systems making them suboptimal for newer applications such as Blockchains. First, when computers that participate in SMR are spread around the world, their performance is dependent on the communication latencies between any two pair of computers. Second, the number of computers required is proportional the number of adversarial computers that need to be tolerated. Consequently, certain SMR systems for Blockchains require hundreds of computers to tolerate heavy adversarial behavior. Many existing SMR technique perform poorly under these scenarios. The techniques presented in this dissertation address various permutations of these challenges. Byzantine Fault Tolerance State Machine Replication
43	Sensitivity of Feedforward Neural Networks to Harsh Computing Environments Arechiga, Austin Podoll 08 August 2018 (has links) Neural Networks have proven themselves very adept at solving a wide variety of problems, in particular they accel at image processing. However, it remains unknown how well they perform under memory errors. This thesis focuses on the robustness of neural networks under memory errors, specifically single event upset style errors where single bits flip in a network's trained parameters. The main goal of these experiments is to determine if different neural network architectures are more robust than others. Initial experiments show that MLPs are more robust than CNNs. Within MLPs, deeper MLPs are more robust and for CNNs larger kernels are more robust. Additionally, the CNNs displayed bimodal failure behavior, where memory errors would either not affect the performance of the network, or they would degrade its performance to be on par with random guessing. VGG16, ResNet50, and InceptionV3 were also tested for their robustness. ResNet50 and InceptionV3 were both more robust than VGG16. This could be due to their use of Batch Normalization or the fact that ResNet50 and InceptionV3 both use shortcut connections in their hidden layers. After determining which networks were most robust, some estimated error rates from neutrons were calculated for space environments to determine if these architectures were robust enough to survive. It was determined that large MLPs, ResNet50, and InceptionV3 could survive in Low Earth Orbit on commercial memory technology and only use software error correction. / Master of Science / Neural networks are a new kind of algorithm that are revolutionizing the field of computer vision. Neural networks can be used to detect and classify objects in pictures or videos with accuracy on par with human performance. Neural networks achieve such good performance after a long training process during which many parameters are adjusted until the network can correctly identify objects such as cats, dogs, trucks, and more. These trained parameters are then stored in a computers memory and then recalled whenever the neural network is used for a computer vision task. Some computer vision tasks are safety critical, such as a self-driving car’s pedestrian detector. An error in that detector could lead to loss of life, so neural networks must be robust against a wide variety of errors. This thesis will focus on a specific kind of error: bit flips in the parameters of a neural networks stored in a computer’s memory. The main goal of these bit flip experiments is to determine if certain kinds of neural networks are more robust than others. Initial experiments show that MLP (Multilayer Perceptions) style networks are more robust than CNNs (Convolutional Neural Network). For MLP style networks, making the network deeper with more layers increases the accuracy and the robustness of the network. However, for the CNNs increasing the depth only increased the accuracy, not the robustness. The robustness of the CNNs displayed an interesting trend of bimodal failure behavior, where memory errors would either not affect the performance of the network, or they would degrade its performance to be on par with random guessing. A second set of experiments were run to focus more on CNN robustness because CNNs are much more capable than MLPs. The second set of experiments focused on the robustness of VGG16, ResNet50, and InceptionV3. These CNNs are all very large and have very good performance on real world datasets such as ImageNet. Bit flip experiments showed that ResNet50 and InceptionV3 were both more robust than VGG16. This could be due to their use of Batch Normalization or the fact that ResNet50 and InceptionV3 both use shortcut connections within their network architecture. However, all three networks still displayed the bimodal failure mode seen previously. After determining which networks were most robust, some estimated error rates were calculated for a real world environment. The chosen environment was the space environment because it naturally causes a high amount of bit flips in memory, so if NASA were to use neural networks on any rovers they would need to make sure the neural networks are robust enough to survive. It was determined that large MLPs, ResNet50, and InceptionV3 could survive in Low Earth Orbit on commercial memory technology and only use software error correction. Using only software error correction will allow satellite makers to build more advanced satellites without paying extra money for radiation-hardened electronics. Machine Learning Fault Tolerance Single Event Upsets
44	SIMD-Swift: Improving Performance of Swift Fault Detection Oleksenko, Oleksii 20 January 2016 (has links) (PDF) The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%. SIMD SSE Fault-Tolerance SIMD SSE Fault-Tolerance ddc:004 rvk:ST 150 rvk:ST 170
45	UpRight fault tolerance Clement, Allen Grogan 13 November 2012 (has links) Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications. / text Fault tolerance Dependability Byzantine fault tolerance UpRight fault tolerance Replicated state machine
46	SIMD-Swift: Improving Performance of Swift Fault Detection Oleksenko, Oleksii 02 December 2015 (has links) The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%. info:eu-repo/classification/ddc/004 ddc:004 SIMD SSE Fault-Tolerance SIMD SSE Fault-Tolerance
47	Generalized Consensus for Practical Fault-Tolerance Garg, Mohit 07 September 2018 (has links) Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols. / Master of Science / Online services like Facebook, Twitter, Netflix and Spotify to cloud services like Google and Amazon serve millions of users which include individuals as well as organizations. They use many distributed technologies to deliver a rich experience. The distributed nature of these technologies has removed geographical barriers to accessing data, services, software, and hardware. An essential aspect of these technologies is the concept of the shared state. Distributed databases with multiple replicated data nodes are an example of this shared state. Maintaining replicated data nodes provides several advantages such as (1) availability so that in case one node goes down the data can still be accessed from other nodes, (2) quick response times, by placing data nodes closer to the user, the data can be obtained quickly, (3) scalability by enabling multiple users to access different nodes so that a single node does not cause bottlenecks. To maintain this shared state some mechanism is required to maintain consistency, that is the copies of these shared state must be identical on all the data nodes. This mechanism is called Consensus, and several such mechanisms exist in practice today which use the Crash Fault Tolerance (CFT). The CFT model implies that these mechanisms provide consistency in the presence of nodes crashing. While the state-of-the-art for security has moved from assuming a trusted environment inside a firewall to a perimeter-less and semi-trusted environment with every service living on the internet, only the application layer is required to be secured while the core is built just with an idea of crashes in mind. While there exists comprehensive research on secure Consensus mechanisms which utilize what is called the Byzantine Fault Tolerance (BFT) model, the extra costs required to implement these mechanisms and comparatively lower performance in a geographically distributed setting has impeded widespread adoption. A new model recently proposed tries to find a cross between these models that is achieving security while paying no extra costs called the Cross Fault Tolerance (XFT). This thesis presents Elpis, a consensus mechanism which uses precisely this model that will secure the shared state from its core without modifications to the existing setups while delivering high performance and lower response times. We perform a comprehensive evaluation on AWS and demonstrate that Elpis achieves 3.5x over the state-of-the-art while improving response times by as much as 50%. Distributed Systems Fault Tolerance Byzantine Fault Tolerance State Machine Replication Multi-Leader Consensus Blockchain
48	Exploring scaling limits and computational paradigms for next generation embedded systems Zykov, Andrey V. 01 June 2010 (has links) It is widely recognized that device and interconnect fabrics at the nanoscale will be characterized by a higher density of permanent defects and increased susceptibility to transient faults. This appears to be intrinsic to nanoscale regimes and fundamentally limits the eventual benefits of the increased device density, i.e., the overheads associated with achieving fault-tolerance may counter the benefits of increased device density -- density-reliability tradeoff. At the same time, as devices scale down one can expect a higher proportion of area to be associated with interconnection, i.e., area is wire dominated. In this work we theoretically explore density-reliability tradeoffs in wire dominated integrated systems. We derive an area scaling model based on simple assumptions capturing the salient features of hierarchical design for high performance systems, along with first order assumptions on reliability, wire area, and wire length across hierarchical levels. We then evaluate overheads associated with using basic fault-tolerance techniques at different levels of the design hierarchy. This, albeit simplified model, allows us to tackle several interesting theoretical questions: (1) When does it make sense to use smaller less reliable devices? (2) At what scale of the design hierarchy should fault tolerance be applied in high performance integrated systems? In the second part of this thesis we explore perturbation-based computational models as a promising choice for implementing next generation ubiquitous information technology on unreliable nanotechnologies. We show the inherent robustness of such computational models to high defect densities and performance uncertainty which, when combined with low manufacturing precision requirements, makes them particularly suitable for emerging nanoelectronics. We propose a hybrid eNano-CMOS perturbation-based computing platform relying on a new style of configurability that exploits the computational model's unique form of unstructured redundancy. We consider the practicality and scalability of perturbation-based computational models by developing and assessing initial foundations for engineering such systems. Specifically, new design and decomposition principles exploiting task specific contextual and temporal scales are proposed and shown to substantially reduce complexity for several benchmark tasks. Our results provide strong evidence for the relevance and potential of this class of computational models when targeted at emerging unreliable nanoelectronics. / text Integrated systems Reliability Fault tolerance Nanoscale devices Nanotechnology Computational models
49	Determine network survivability using heuristic models Chua, Eng Hong 03 1900 (has links) Approved for public release; distribution in unlimited. / Contemporary large-scale networked systems have improved the efficiency and effectiveness of our way of life. However, such benefit is accompanied by elevated risks of intrusion and compromises. Incorporating survivability capabilities into systems is one of the ways to mitigate these risks. The Server Agent-based Active network Management (SAAM) project was initiated as part of the next generation Internet project to address the increasing multi-media Internet service demands. Its objective is to provide a consistent and dedicated quality of service to the users. SAAM monitors the network traffic conditions in a region and responds to routing requests from the routers in that region with optimal routes. Mobility has been incorporated to SAAM server to prevent a single point of failure from bringing down the entire SAAM server and its service. With mobility, it is very important to select a good SAAM server locality from the client's point of view. The choice of the server must be a node where connection to the client is most survivable. In order to do that, a general metric is defined to measure the connection survivability of each of the potential server hosts. However, due to the complexity of the network, the computation of the metric becomes very complex too. This thesis develops heuristic solutions of polynomial complexity to find the hosting server node. In doing so, it minimizes the time and computer power required. / Defence Science & Technology Agency (Singapore) Computer networks Reliability Fault Tolerance Network Reliability Survivability Server Placement
50	Radiation robustness of XOR and majority voter circuits at finFET technology under variability Aguiar, Ygor Quadros de January 2017 (has links) Os avanços na microeletrônica contribuíram para a redução de tamanho do nó tecnológico, diminuindo a tensão de limiar e aumentando a freqüência de operação dos sistemas. Embora tenha resultado em ganhos positivos relacionados ao desempenho e ao consumo de energia dos circuitos VLSI, a miniaturização também tem um impacto negativo em termos de confiabilidade dos projetos. À medida que a tecnologia diminui, os circuitos estão se tornando mais suscetíveis a inúmeros efeitos devido à redução da robustez ao ruído externo, bem como ao aumento do grau de incerteza relacionado às muitas fontes de variabilidade. As técnicas de tolerancia a falhas geralmente são usadas para melhorar a robustez das aplicações de segurança crítica. No entanto, as implicações da redução da tecnologia interferem na eficácia de tais abordagem em fornecer a cobertura de falhas desejada. Por esse motivo, este trabalho avaliou a robustez aos efeitos de radiação de diferentes circuitos projetados na tecnologia FinFET sob efeitos de variabilidade. Para determinar as melhores opções de projeto para implementar técnicas de tolerancia a falhas, como os esquemas de Redundância de módulo triplo (TMR) e/ou duplicação com comparação (DWC), o conjunto de circuitos analisados é composto por dez diferentes topologias de porta lógica OR-exclusivo (XOR) e dois circuitos votadores maioritários (MJV). Para investigar o efeito da configuração do gate dos dispositivos FinFET, os circuitos XOR são analisados usando a configuração de double-gate (DG FinFET) e tri-gate (TG FinFET). A variabilidade ambiental, como variabilidade de temperatura e tensão, são avaliadas no conjunto de circuitos analisados. Além disso, o efeito da variabilidade de processo Work-Function Fluctuation (WFF) também é avaliado. A fim de fornecer um estudo mais preciso, o projeto do leiaute dos circuitos MJV usando 7nm FinFET PDK é avaliado pela ferramenta preditiva MUSCA SEP3 para estimar o Soft-Error Rate (SER) dos circuitos considerando as características do leiaute e as camadas de Back-End-Of-Line (BEOL) e Front-End-Of-Line (FEOL) de um nó tecnológico avançado. / Advances in microelectronics have contributed to the size reduction of the technological node, lowering the threshold voltage and increasing the operating frequency of the systems. Although it has positive outcomes related to the performance and power consumption of VLSI circuits, it does also have a strong negative impact in terms of the reliability of designs. As technology scales down, the circuits are becoming more susceptible to numerous effects due to the reduction of robustness to external noise as well as the increase of uncertainty degree related to the many sources of variability. Faulttolerant techniques are usually used to improve the robustness of safety critical applications. However, the implications of the scaling of technology have interfered against the effectiveness of fault-tolerant approaches to provide the fault coverage. For this reason, this work has evaluated the radiation robustness of different circuits designed in FinFET technology under variability effects. In order to determine the best design options to implement fault-tolerant techniques such as the Triple-Module Redundancy (TMR) and/or Duplication with Comparison (DWC) schemes, the set of analyzed circuits is composed of ten different exclusive-OR (XOR) logic gate topologies and two majority voter (MJV) circuits. To investigate the effect of gate configuration of FinFET devices, the XOR circuits is analyzed using double-gate configuration (DG FinFET) and tri-gate configuration (TG FinFET). Environmental Variability such as Temperature and Voltage Variability are evaluated in the set of analyzed circuits. Additionally, the process-related variability effect Work-Function Fluctuation (WFF) is also evaluated. In order to provide a more precise study, the layout design of the MJV circuits using a 7nm FinFET PDK is evaluated by the predictive MUSCA SEP3 tool to estimate the Soft-Error Rate (SER) of the circuits considering the layout contrainsts and Back-End-Of-Line (BEOL) and Front-End-Of-Line (FEOL) layers of an advanced technology node. Microeletrônica Circuitos digitais Microelectronics FinFET Variability Radiation Effects Fault Tolerance

Search results