Global ETD Search

211	A process variation tolerant self compensation sense amplifier design Choudhary, Aarti, January 2008 (has links) Thesis (M.S.E.C.E. )--University of Massachusetts Amherst, 2008. / Includes bibliographical references (p. 84-88).
212	Architecture pour la reconfiguration en temps réel des systèmes complexes / An architecture for real time reconfiguration of complex systems Guadri, Ahmed 15 December 2009 (has links) Nous proposons une méthodologie de conception pour les systèmes de commande tolérants aux fautes en partant d’un modèle de base exhaustif pour le système complexe à superviser. En pratique, la modélisation exhaustive est réalisée grâce à un automate hybride enrichi par des paramètres quantifiant les défaillances possibles. Ceci permet de modéliser les défaillances partielles. Dans la phase hors ligne, ce système complexe est transformé en un système discret abstrait et exploitable selon des techniques dédiées. Un superviseur est alors construit selon les objectifs de fonctionnement.Lors du fonctionnement du système, l’occurrence d’une défaillance se traduit par l’invalidation de plusieurs comportements dans le modèle abstrait et l’introduction d’incertitudes. Par la suite, les modules de diagnostic et d’identification (qui ne rentrent pas dans l’objet de notre thèse) réduisent de façon progressive le modèle hybride au cours du temps. Afin de pouvoir mettre à jour le modèle discret abstrait, on a développé des algorithmes de calcul d’atteignabilité, de vérification et de génération de régions stabilisées.Pour pouvoir superviser un tel système, l’utilisation de méthodologies d’abstraction est nécessaire afin de transformer le modèle bas niveau exhaustif en un modèle discret approprié. Nous réalisons cette abstraction en proposant des algorithmes qui tiennent compte du contexte d’utilisation (objectifs, contraintes…). Lorsqu’une défaillance est détectée, la reconfiguration est déclenchée en essayant, au fur et à mesure de l’enrichissement du modèle abstrait, de réduire le fonctionnement du système défaillant dans un des schémas prédéfinis / We propose a methodology for the design of fault tolerant control systems for the supervision of complex systems. First, the system is exhaustively described through a hybrid automaton which is enriched with parameters that quantifies the possible complete or partial faults. In the offline stage, the complex system is abstracted into a useful discrete event system with dedicated approaches. After, a supervisor is designed according to the standard objectives.In the online stage, fault detection can be interpreted by prohibiting some behaviours and introducing uncertainties. Thereafter, the diagnosis and identification modules (that are not treated in this work) reduce progressively the hybrid system. In order to update the abstract model, we developed some algorithms for reachability calculus, verification and generation of stabilized regions.In order to supervise the faulty system, the use of abstraction methodologies is necessary in order to transform the low level exhaustive model into an abstract and appropriate model. This abstraction is realized through algorithms that takes into account the abstraction context (objectives, constraints…). When a fault is detected, the reconfiguration is triggered by trying, as the abstract model is enriched, to reduce the behaviour of the faulty system into a preconceived model Tolérance aux fautes Reconfiguration Sûreté de fonctionnement Abstraction Systèmes hybrides Fault tolerance Reconfiguration Safety Abstraction Hybrid systems
213	Etude de la migration de tâches dans une architecture multi-tuile. Génération automatique d'une solution basée sur des agents / Study of task migration in a multi-tiled architecture. Automatic generation of an agent based solution Elantably, Ashraf 16 December 2015 (has links) Les systèmes multiprocesseurs sur puce (MPSoC) mis en oeuvre dans les architecturesmulti-tuiles fournissent des solutions prometteuses pour exécuter des applicationssophistiquées et modernes. Une tuile contient au moins un processeur, unemémoire principale privée et des périphériques nécessaires associés à un dispositifchargé de la communication inter-tuile. Cependant, la fiabilité de ces systèmesest toujours un problème. Une réponse possible à ce problème est la migrationde tâches. Le transfert de l’exécution d’une tâche d’une tuile à l’autre permet degarder une fiabilité acceptable de ces systèmes. Nous proposons dans ce travail unetechnique de migration de tâches basée sur des agents. Cette technique vise lesapplications de flot de données en cours d’exécution sur des architectures multituiles.Une couche logicielle “middleware” est conçue pour supporter les agentsde migration. Cette couche rend la solution transparente pour les programmeursd’applications et facilite sa portabilité sur architectures multi-tuiles différentes. Afinque cette solution soit évolutive, une chaîne d’outils de génération automatique estconçue pour générer les agents de migration. Grâce à ces outils, ces informationssont extraites automatiquement des graphes de tâches et du placement optimisésur les tuiles du système. L’algorithme de migration est aussi détaillé, en montrantles phases successives et les transferts d’information nécessaires. La chaîne d’outilsest capable de générer du code pour les architectures ARM et x86. Cette techniquede migration de tâche peut être déployée sur les systèmes d’exploitation quine supportent ni chargement dynamique ni unité de gestion mémoire MMU. Lesrésultats expérimentaux sur une plateforme x86 matérielle et une plateforme ARMde simulation montrent peu de surcoût en terme de mémoire et de performance, cequi rend cette solution efficace. / Fully distributed memory multi-processors (MPSoC) implemented in multi-tiled architectures are promising solutions to support modern sophisticated applications, however, reliability of such systems is always an issue. As a result, a system-level solution like task migration keeps its importance. Transferring the execution of a task from one tile to another helps keep acceptable reliability of such systems. A tile contains at least one processor, private main memory and associated peripherals with a communication device responsible for inter-tile communications. We propose in this work an agent based task migration technique that targets data-flow applications running on multi-tiled architectures. This technique uses a middleware layer that makes it transparent to application programmers and eases its portability over different multi-tiled architectures. In order for this solution to be scalable to systems with more tiles, an automatic generation tool-chain is designed to generate migration agents and provide them with necessary information enabling them to execute migration processes properly. Such information is extracted automatically from application(s) task graphs and mapping on the system tiles. We show how agents are placed with applications and how such necessary information is generated and linked with them. The tool-chain is capable of generating code for ARM and x86 architectures. This task migration technique can be deployed on small operating systems that support neither MMU nor dynamic loading for task code. We show that this technique is operational on x86 based real hardware platform as well as on an ARM based simulation platform. Experimental results show low overhead both in memory and performance. Performance overhead due to migration of a task in a typical small application where it has one predecessor and one successor is 18.25%. Migration de tâches Architectures de mémoire distribuées La tolérance aux fautes Task migration Distributed architectures Fault tolerance 620
214	Performance et fiabilité des protocoles de tolérance aux fautes / Towards Performance and Dependability Benchmarking of Distributed Fault Tolerance Protocols Gupta, Divya 18 March 2016 (has links) A l'ère de l’informatique omniprésente et à la demande, où les applications et les services sont déployés sur des infrastructures bien gérées et approvisionnées par des grands groupes de fournisseurs d’informatique en nuage (Cloud Computing), tels Amazon,Google,Microsoft,Oracle, etc, la performance et la fiabilité de ces systèmes sont devenues des objectifs primordiaux. Cette informatique a rendu particulièrement nécessaire la prise en compte des facteurs de la Qualité de Service (QoS), telles que la disponibilité, la fiabilité, la vivacité, la sureté et la sécurité,dans la définition complète d’un système. En effet, les systèmes informatiques doivent être résistants aussi bien aux défaillances qu’aux attaques et ce, afin d'éviter qu'ils ne deviennent inaccessibles, entrainent des couts de maintenance importants et la perte de parts de marché. L'augmentation de la taille et la complexité des systèmes en nuage rend de plus en plus commun les défauts, augmentant la fréquence des pannes, et n’offrant donc plus la Garantie de Service visée. Les fournisseurs d’informatique en nuage font ainsi face épisodiquement à des fautes arbitraires, dites Byzantines, durant lesquelles les systèmes ont des comportements imprévisibles.Ce constat a amené les chercheurs à s’intéresser de plus en plus à la tolérance aux fautes byzantines (BFT) et à proposer de nombreux prototypes de protocoles et logiciels. Ces solutions de BFT visent non seulement à fournir des services cohérents et continus malgré des défaillances arbitraires, mais cherchent aussi à réduire le coût et l’impact sur les performances des systèmes sous-jacents. Néanmoins les prototypes BFT ont été évalués le plus souvent dans des contextes ad hoc, soit dans des conditions idéales, soit en limitant les scénarios de fautes. C’est pourquoi ces protocoles de BFT n’ont pas réussi à convaincre les professionnels des systèmes distribués de les adopter. Cette thèse entend répondre à ce problème en proposant un environnement complet de banc d’essai dont le but est de faciliter la création de scénarios d'exécution utilisables pour aussi bien analyser que comparer l'efficacité et la robustesse des propositions BFT existantes. Les contributions de cette thèse sont les suivantes :Nous introduisons une architecture générique pour analyser des protocoles distribués. Cette architecture comprend des composants réutilisables permettant la mise en œuvre d’outils de mesure des performances et d’analyse de la fiabilité des protocoles distribués. Cette architecture permet de définir la charge de travail, de défaillance, et l’injection de ces dernières. Elle fournit aussi des statistiques de performance, de fiabilité du système de bas niveau et du réseau. En outre, cette thèse présente les bénéfices d’une architecture générale.Nous présentons BFT-Bench, le premier système de banc d’essai de la BFT, pour l'analyse et la comparaison d’un panel de protocoles BFT utilisés dans des situations identiques. BFT-Bench permet aux utilisateurs d'évaluer des implémentations différentes pour lesquels ils définissent des comportements défaillants avec différentes charges de travail.Il permet de déployer automatiquement les protocoles BFT étudiés dans un environnement distribué et offre la possibilité de suivre et de rendre compte des aspects performance et fiabilité. Parmi nos résultats, nous présentons une comparaison de certains protocoles BFT actuels, réalisée avec BFT-Bench, en définissant différentes charges de travail et différents scénarii de fautes. Cette réelle application de BFT-Bench en démontre l’efficacité.Le logiciel BFT-Bench a été conçu en ce sens pour aider les utilisateurs à comparer efficacement différentes implémentations de BFT et apporter des solutions effectives aux lacunes identifiées des prototypes BFT. De plus, cette thèse défend l’idée que les techniques BFT sont nécessaires pour assurer un fonctionnement continu et correct des systèmes distribués confrontés à des situations critiques. / In the modern era of on-demand ubiquitous computing, where applications and services are deployed in well-provisioned, well-managed infrastructures, administered by large groups of cloud providers such as Amazon, Google, Microsoft, Oracle, etc., performance and dependability of the systems have become primary objectives.Cloud computing has evolved from questioning the Quality-of-Service (QoS) making factors such as availability, reliability, liveness, safety and security, extremely necessary in the complete definition of a system. Indeed, computing systems must be resilient in the presence of failures and attacks to prevent their inaccessibility which can lead to expensive maintenance costs and loss of business. With the growing components in cloud systems, faults occur more commonly resulting in frequent cloud outages and failing to guarantee the QoS. Cloud providers have seen episodic incidents of arbitrary (i.e., Byzantine) faults where systems demonstrate unpredictable conducts, which includes incorrect response of a client's request, sending corrupt messages, intentional delaying of messages, disobeying the ordering of the requests, etc.This has led researchers to extensively study Byzantine Fault Tolerance (BFT) and propose numerous protocols and software prototypes. These BFT solutions not only provide consistent and available services despite arbitrary failures, they also intend to reduce the cost and performance overhead incurred by the underlying systems. However, BFT prototypes have been evaluated in ad-hoc settings, considering either ideal conditions or very limited faulty scenarios. This fails to convince the practitioners for the adoption of BFT protocols in a distributed system. Some argue on the applicability of expensive and complex BFT to tolerate arbitrary faults while others are skeptical on the adeptness of BFT techniques. This thesis precisely addresses this problem and presents a comprehensive benchmarking environment which eases the setup of execution scenarios to analyze and compare the effectiveness and robustness of these existing BFT proposals.Specifically, contributions of this dissertation are as follows.First, we introduce a generic architecture for benchmarking distributed protocols. This architecture, comprises reusable components for building a benchmark for performance and dependability analysis of distributed protocols. The architecture allows defining workload and faultload, and their injection. It also produces performance, dependability, and low-level system and network statistics. Furthermore, the thesis presents the benefits of a general architecture.Second, we present BFT-Bench, the first BFT benchmark, for analyzing and comparing representative BFT protocols under identical scenarios. BFT-Bench allows end-users evaluate different BFT implementations under user-defined faulty behaviors and varying workloads. It allows automatic deploying these BFT protocols in a distributed setting with ability to perform monitoring and reporting of performance and dependability aspects. In our results, we empirically compare some existing state-of-the-art BFT protocols, in various workloads and fault scenarios with BFT-Bench, demonstrating its effectiveness in practice.Overall, this thesis aims to make BFT benchmarking easy to adopt by developers and end-users of BFT protocols.BFT-Bench framework intends to help users to perform efficient comparisons of competing BFT implementations, and incorporating effective solutions to the detected loopholes in the BFT prototypes. Furthermore, this dissertation strengthens the belief in the need of BFT techniques for ensuring correct and continued progress of distributed systems during critical fault occurrence. Fiabilité Performance Protocoles de tolérance Fautes Dependability Performance Fault tolerance protocols Fault 004
215	Designing single event upset mitigation techniques for large SRAM-Based FPGA components / Desenvolvimento de técnicas de tolerância a falhas transientes em componentes programáveis por SRAM Kastensmidt, Fernanda Gusmão de Lima January 2003 (has links) Esse trabalho consiste no estudo e desenvolvimento de técnicas de proteção a falhas transientes, também chamadas single event upset (SEU), em circuitos programáveis customizáveis por células SRAM. Os projetistas de circuitos eletrônicos estão cada vez mais predispostos a utilizar circuitos programáveis, conhecidos como Field Programmable Gate Array (FPGA), para aplicações espaciais devido a sua alta flexibilidade lógica, alto desempenho, baixo custo no desenvolvimento, rapidez na prototipação e principalmente pela reconfigurabilidade. Em particular, FPGAs customizados por SRAM são muito importantes para missões espaciais pois podem ser rapidamente reprogramados à distância quantas vezes for necessário. A técnica de proteção baseada em redundância tripla, conhecida como TMR, é comumente utilizada em circuitos integrados de aplicações específicas e pode também ser aplicada em circuitos programáveis como FPGAs. A técnica TMR foi testada no FPGA Virtex® da Xilinx em aplicações como contadores e micro-controladores. Falhas foram injetadas em todos as partes sensíveis da arquitetura e seus efeitos foram detalhadamente analisados. Os resultados de injeção de falhas e dos experimentos sob radiação em laboratório comprovaram a eficácia do TMR em proteger circuitos sintetizados em FPGAs customizados por SRAM. Todavia, essa técnica possui algumas limitações como aumento em área, uso de três vezes mais pinos de entrada e saída (E/S) e conseqüentemente, aumento na dissipação de potência. Com o objetivo de reduzir custos no TMR e melhorar a confiabilidade, uma técnica inovadora de tolerância a falhas para FPGAs customizados por SRAM foi desenvolvida para ser implementada em alto nível, sem modificações na arquitetura do componente. Essa técnica combina redundância espacial e temporal para reduzir custos e assegurar confiabilidade. Ela é baseada em duplicação com um circuito comparador e um bloco de detecção concorrente de falhas. Esta nova técnica proposta neste trabalho foi especificamente projetada para tratar o efeito de falhas transientes em blocos combinacionais e seqüenciais na arquitetura reconfigurável, reduzir o uso de pinos de E/S, área e dissipação de potência. A metodologia foi validada por injeção de falhas emuladas em uma placa de prototipação. O trabalho mostra uma comparação nos resultados de cobertura de falhas, área e desempenho entre as técnicas apresentadas. / This thesis presents the study and development of fault-tolerant techniques for programmable architectures, the well-known Field Programmable Gate Arrays (FPGAs), customizable by SRAM. FPGAs are becoming more valuable for space applications because of the high density, high performance, reduced development cost and re-programmability. In particular, SRAM-based FPGAs are very valuable for remote missions because of the possibility of being reprogrammed by the user as many times as necessary in a very short period. SRAM-based FPGA and micro-controllers represent a wide range of components in space applications, and as a result will be the focus of this work, more specifically the Virtex® family from Xilinx and the architecture of the 8051 micro-controller from Intel. The Triple Modular Redundancy (TMR) with voters is a common high-level technique to protect ASICs against single event upset (SEU) and it can also be applied to FPGAs. The TMR technique was first tested in the Virtex® FPGA architecture by using a small design based on counters. Faults were injected in all sensitive parts of the FPGA and a detailed analysis of the effect of a fault in a TMR design synthesized in the Virtex® platform was performed. Results from fault injection and from a radiation ground test facility showed the efficiency of the TMR for the related case study circuit. Although TMR has showed a high reliability, this technique presents some limitations, such as area overhead, three times more input and output pins and, consequently, a significant increase in power dissipation. Aiming to reduce TMR costs and improve reliability, an innovative high-level technique for designing fault-tolerant systems in SRAM-based FPGAs was developed, without modification in the FPGA architecture. This technique combines time and hardware redundancy to reduce overhead and to ensure reliability. It is based on duplication with comparison and concurrent error detection. The new technique proposed in this work was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. The thesis presents comparison results in fault coverage, area and performance between the discussed techniques. Microeletrônica Fpga Tolerancia : Falhas Fault tolerance Single event upset Fault injection Time and hardware redundancy
216	Uma estratégia baseada em programação orientada a aspectos para injeção de falhas de comunicação / A fault injection communication tool based on aspect oriented programming Silveira, Karina Kohl January 2005 (has links) A injeção de falhas permite acelerar a ocorrência de erros em um sistema para que seja possível a validação de seu comportamento sob falhas, assim como a avaliação do impacto dos mecanismos de detecção e remoção de erros no desempenho do sistema. Abordagens que facilitem o desenvolvimento de injetores vêm sendo buscadas com empenho, variando desde a inserção de injetores no kernel do sistema operacional até o uso de reflexão computacional para aplicações orientadas a objetos. Este trabalho explora os recursos da Programação Orientada a Aspectos como estratégia para a criação de ferramentas de injeção de falhas. A Programação Orientada a Aspectos tem como objetivo a modularização de interesses transversais, isto é, interesses que atravessam as unidades naturais de modularização. A injeção de falhas possui um comportamento que abrange os diversos módulos da aplicação alvo, afetando métodos que são executados em diversas classes em diversos pontos da aplicação. Desta forma, a injeção de falhas pode ser encapsulada sob a forma de aspectos. Para demonstrar a validade da proposta apresentada foi desenvolvida a ferramenta FICTA – Fault Injection Communication Tool based on Aspects. O objetivo é a validação de aplicações Java distribuídas, construídas sobre o protocolo UDP e que implementem mecanismos de tolerância a falhas em protocolos de camadas superiores. A importância de instrumentar um protocolo de base é justificada pelo fato da necessidade de validar aplicações, toolkits e middlewares que implementem tolerância a falhas em camadas superiores, logo, esses protocolos devem lidar corretamente com as falhas de mais baixo nível. A ferramenta abrange falha de colapso e omissão de mensagens do protocolo UDP. O uso de Programação Orientada a Aspectos na construção de FICTA resultou em uma ferramenta altamente modular, reusável e flexível, que pode ser facilmente inserida e removida da aplicação alvo, sem causar intrusividade espacial no código fonte da aplicação. / The fault injection allows us to accelerate the occurrence of failures in a system so that it is possible to validate its behavior under faults, as well as the evaluation of the impact on the mechanisms of detection and removal of failures in the performance of the system. The approaches that may facilitate the development of injectors have been searched with effort, varying from the insertion of injectors in the kernel of the operational system up to the computational reflection for object oriented applications. This work explores the resources of the Aspect Oriented Programming as a strategy to create tools of fault injection. The Aspect Oriented Programming has as its goal the modularization of the crosscutting concerns, that is to say the interests that cross the natural units of modularization. The fault injection has a behavior that covers the various modules of the target application, affecting methods that are executed in several classes of several areas of the application. Thus, the Fault Injection may be encapsulated under the form of aspects. To demonstrate the worthiness of the presented proposal, a tool called FICTA - Fault Injection Communication Tool based on Aspects, has been developed. The aim is to validate Java distributed applications built under the UDP protocol so that the fault tolerance mechanisms can be implemented in upper layers. The importance of instrumentate a protocol of base is justified by the necessity of validating applications, toolkits and middlewares that implement fault tolerance in upper layers, then, these protocols must deal correctly with the lower level faults. The tool covers crash and message omission faults of the UDP protocol. The use of Aspect Oriented Programming in the construction of FICTA resulted in a tool highly modular, reusable and flexible that may be easily inserted and removed from the target application, without causing spatial intrusiveness in the source code of the application. Tolerancia : Falhas Arquivos distribuidos Fault tolerance Fault injection Aspect oriented programming
217	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures Pereira, Mônica Magalhães January 2012 (has links) As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place. Tolerancia : Falhas Microeletrônica Cmos Reconfigurable architectures Fault tolerance Reliability analysis Scaling
218	Avaliação de atraso, consumo e proteção de somadores tolerantes a falhas / Evaluating delay, power and protection of fault tolerant adders Franck, Helen de Souza January 2011 (has links) Nos últimos anos, os sistemas integrados em silício (SOCs - Systems-on-Chip) têm se tornado menos imunes a ruído, em decorrência dos ajustes necessários na tecnologia CMOS (Complementary Metal-Oxide-Silicon) para garantir o funcionamento dos transistores com dimensões nanométricas. Dentre tais ajustes, a redução da tensão de alimentação e da tensão de limiar (threshold) tornam os SOCs mais suscetíveis a falhas transientes, principalmente aquelas provocadas pela colisão de partículas energéticas que provêm do espaço e encontram-se presentes na atmosfera terrestre. Quando uma partícula energética de alta energia colide com o dreno de um transistor que está desligado, ela perde energia e produz pares elétron-lacuna livres, resultando em uma trilha de ionização. A ionização pode gerar um pulso transiente de tensão que pode ser interpretado como uma mudança no sinal lógico. Em um circuito combinacional, o pulso pode propagar-se até ser armazenado em um elemento de memória. Tal fenômeno é denominado Single-Event Transient (SET). Como a tendência é que as dimensões dos dispositivos fabricados com tecnologia CMOS continuem reduzindo por mais alguns anos, a ocorrência de SETs em SOCs operando na superfície terrestre tende a aumentar, exigindo a adoção de técnicas de tolerância a falhas no projeto de SOCs. O presente trabalho tem por objetivo avaliar circuitos somadores tolerantes a falhas transientes encontrados na literatura. Duas arquiteturas de somadores foram escolhidas: Ripple Carry Adder (RCA) e Binary Signed Digit Adder (BSDA). O RCA foi escolhido por ser o tipo de somador de menor custo e por isso, amplamente utilizado em SOCs. Já o BSDA foi escolhido porque utiliza o sistema numérico de dígito binário com sinal (Binary Signed Digit – BSD). Por ser um sistema de representação redundante, o uso de BSD facilita a aplicação de técnicas de tolerância a falhas baseadas em redundância de informação. Os somadores protegidos avaliados foram projetados com as seguintes técnicas: Redundância Modular Tripla (Triple Modular Redundancy - TMR) e Recomputação com Entradas e Saídas Invertidas (RESI), no caso do RCA, e codificação 1 de 3 e verificação de paridade, no caso do BSDA. As 9 arquiteturas de somadores foram simuladas no nível elétrico usando o Modelo Tecnológico Preditivo (Predictive Technology Model - PTM) de 45nm e considerando quatro comprimentos de operandos: 4, 8, 16 e 32 bits. Os resultados obtidos permitiram quantificar o número de transistores, o atraso crítico e a potência média consumida por cada arquitetura protegida. Também foram realizadas campanhas de injeção de falhas, por meio de simulações no nível elétrico, para estimar o grau de proteção de cada arquitetura. Os resultados obtidos servem para guiar os projetistas de SOCs na escolha da arquitetura de somador tolerante a falhas mais adequada aos requisitos de cada projeto. / In the past recent years, integrated systems on a chip (Systems-on-chip - SOCs) became less immune to noise due to the adjusts in CMOS technology needed to assure the operation of nanometric transistors. Among such adjusts, the reductions in supply voltage and threshold voltage make SOSs more susceptible to transient faults, mainly those provoked by the collision of charged particles coming from the outer space that are present in the atmosphere. When a heavily energy charged particle hits the drain region of a transistor that is at the off state it produces free electron-hole pairs, resulting in an ionizing track. The ionization may generate a transient voltage pulse that can be interpreted as a change in the logic signal. In a combinational circuit, the pulse may propagate up to the primary outputs and may be captured by the output storage element. Such phenomenon is referred to as Single-Event Transient (SET). Since it is expected that transistor dimensions will continue to reduce in the next technological nodes, the occurrence of SETs at Earth surface will increase and therefore, fault tolerance techniques will become a must in the design of SOSs. The present work targets the evaluation of transient fault-tolerant adders found in the literature. Two adder architectures were chosen: the Ripple-Carry Adder (RCA) and the Binary Signed Digit Adder (BSDA). The RCA was chosen because it is the least expensive and therefore, the most used architecture for SOS design. The BSDA, in turn, was chosen because it uses the Binary Signed Digit (BSD) system. As a redundant number system, the BSD paves the way to the implementation of fault-tolerant adders using information redundancy. The evaluated fault-tolerant adders were implemented by using the following techniques: Triple Module Redundancy (TMR) and Recomputing with Inverted Inputs and Outputs (RESI), in the case of the RCA, and 1 out of 3 coding and parity verification, in the case of the BSDA. A total of 9 adder architectures were simulated at the electric-level using the Predictive Technology Model (PTM) for 45nm in four different bitwidths: 4, 8, 16 and 32. The obtained results allowed for quantifying the number of transistors, critical delay and average power consumption for each fault-tolerant architecture. Fault injection campaigns were also accomplished by means of electric-level simulations to estimate the degree of protection of each architecture. The results obtained in the present work may be used to guide SOS designers in the choice of the fault-tolerant adder architecture that is most likely to satisfy the design requirements. Microeletrônica Tolerancia : Falhas Fault tolerance SET (single-event transient) Binary-signed digit number system Adder architectures
219	CFT-tool : ferramenta configurável para aplicação de técnicas de detecção de falhas em processadores por software / CFT-tool: configurable tool to application of faults detection techniques in processors by software Chielle, Eduardo January 2012 (has links) Este trabalho apresenta uma ferramenta configurável, denominada de CFT-tool, capaz de aplicar automaticamente técnicas de detecção de erros em software com o objetivo de proteger processadores com diferentes arquiteturas e organizações contra falhas transientes no hardware. As técnicas baseadas em redundância e comparação são aplicadas pela CFT-tool no código assembly de um programa desprotegido, compilado para a arquitetura alvo. A ferramenta desenvolvida foi validada utilizando dois processadores distintos: miniMIPS e LEON3. O processador miniMIPS foi utilizado para verificar a eficiência, em termos de taxa de detecção de erros, tempo de execução e ocupação de memória, das técnicas de detecção em software aplicadas pela CFT-tool, comparando os resultados obtidos com os presentes na literatura. O processador LEON3 foi selecionado por ser amplamente utilizado em aplicações espaciais e por ser baseado em uma arquitetura diferente da arquitetura do processador miniMIPS. Com o processador LEON3 é verificada a configurabilidade da CFT-tool, isto é, a capacidade dela de aplicar técnicas de detecção em software em um código compilado para um diferente processador, o mantendo funcional e sendo capaz de detectar erros. A CFT-tool pode ser utilizada para proteger programas para outras arquiteturas e organizações através da modificação dos arquivos de configuração da ferramenta. A configuração das técnicas é definida segundo as especificações da aplicação, recursos do processador e seleções do usuário. Programas foram protegidos e falhas foram injetadas em nível lógico em ambos os processadores. Para o processador miniMIPS, as taxas de detecção de erros, os tempos de execução e as ocupações de memórias dos programas protegidos se mostraram compatíveis com os resultados presentes na literatura. Resultados semelhantes foram encontrados para o processador LEON3. Diferenças entre os resultados ocorrem devido às características da arquitetura. A ferramenta CFT-tool por ser configurável pode proteger o código na integralidade ou selecionar partes do código e registradores que serão redundantes e protegidos. A vantagem de proteger parte do código é reduzir o custo final em termos de tempo de processamento e ocupação de memória. Uma análise do impacto da seleção seletiva de registradores na taxa de detecção de erros é apresentada. E diretivas de alcançar um comprometimento ótimo entre quantidade de registradores protegidos, taxa de detecção de erros e custo são discutidas. / This work presents a configurable tool, called CFT-tool, capable of automatically applying software-based error detection techniques aiming to protect processors with different architectures and organizations against transient faults in the hardware. The techniques are based on redundancy and comparison. They are applied by CFT-tool in the assembly code of an unprotected program, compiled to the target architecture. The developed tool was validated using two distinct processors: miniMIPS and LEON3. The miniMIPS processor has been utilized to verify the efficiency of the software-based techniques applied by CFT-tool in the assembly code of unprotected programs in terms of error detection rate, runtime and memory occupation, comparing the obtained results with those presented in the literature. The LEON3 processor was selected because it is largely adopted in space applications and because it is based on a different architecture that miniMIPS processor. The configurability of the CFT-tool is verified with the LEON3 processor, that is, the capability of the tool at applying software-based detection techniques in a code compiled to a different processor, maintaining it functional and capable of detecting errors. The CFT-tool can be utilized to protect programs compiled to other architectures and organizations by modifying the configuration files of the tool. The configuration of the techniques is defined by the specifications of the application, processor resources and selections of the user. Programs were protected and faults were injected in logical level in both processors. When using the miniMIPS processor, the error detection rates, runtimes and memory occupations of the protected programs are comparable to the results presents in the literature. Similar results are reached with the LEON3 processor. Differences between the results are due to architecture features. The CFT-tool can be configurable to protect the entire code or to select portions of the code or registers that will be redundant and protected. The advantage of protecting portions of the code is to reduce the final cost in terms of runtime and memory occupation. An analysis of the impact of selective selection of registers in the error detection rate is also presented. And policies to reach an optimum committal between amount of protected registers, error detection rate and cost are discussed. Microeletrônica Tolerancia : Falhas Fault tolerance Transient faults SEU SET Software-based detection techniques
220	Extensão do suporte para simulação de defeitos em algoritmos distribuídos utilizando o Neko / Extension to support failures in distributed algorithm simulation using Neko Rodrigues, Luiz Antonio January 2006 (has links) O estudo e desenvolvimento de sistemas distribuídos é uma tarefa que demanda grande esforço e recursos. Por este motivo, a pesquisa em sistemas deste tipo pode ser auxiliada com o uso de simuladores, bem como por meio da emulação. A vantagem de se usar simuladores é que eles permitem obter resultados bastante satisfatórios sem causar impactos indesejados no mundo real e, conseqüentemente, evitando desperdícios de recursos. Além disto, testes em larga escala podem ser controlados e reproduzidos. Neste sentido, vem sendo desenvolvido desde 2000 um framework para simulação de algoritmos distribuídos denominado Neko. Por meio deste framework, algoritmos podem ser simulados em uma única máquina ou executados em uma rede real utilizando-se o mesmo código nos dois casos. Entretanto, através de um estudo realizado sobre os modelos de defeitos mais utilizados na literatura, verificou-se que o Neko é ainda bastante restrito nesta área. A única classe de defeito abordada, lá referida como colapso, permite apenas o bloqueio temporário de mensagens do processo. Assim, foram definidos mecanismos para a simulação das seguintes classes de defeitos: omissão de mensagens, colapso de processo, e alguns defeitos de rede tais como quebra de enlace, perda de mensagens e particionamento. A implementação foi feita em Java e as alterações necessárias no Neko estão documentadas no texto. Para dar suporte aos mecanismos de simulação de defeitos, foram feitas alterações no código fonte de algumas classes do framework, o que exige que a versão original seja alterada para utilizar as soluções. No entanto, qualquer aplicação desenvolvida anteriormente para a versão original poderá ser executada normalmente independente das modificações efetuadas. Para testar e validar as propostas e soluções desenvolvidas foram utilizados estudos de caso. Por fim, para facilitar o uso do Neko foi gerado um documento contendo informações sobre instalação, configuração e principais mecanismos disponíveis no simulador, incluindo o suporte a simulação de defeitos desenvolvido neste trabalho. / The study and development of distributed systems is a task that demands great effort and resources. For this reason, the research in systems of this type can be assisted by the use of simulators, as well as by means of the emulation. The advantage of using simulators is that, in general, they allow to get acceptable results without causing harming impacts in the real world and, consequently, preventing wastefulness of resources. Moreover, tests on a large scale can be controlled and reproduced. In this way, since 2000, a framework for the simulation of distributed algorithms called Neko has been developed. By means of this framework, algorithms can be simulated in a single machine or executed in a real network, using the same code in both cases. However, studying the most known and used failure models developed having in mind distributed systems, we realized that the support offered by Neko for failure simulation was too restrictive. The only developed failure class, originally named crash, allowed only a temporary blocking of process’ messages. Thus, mechanisms for the simulation of the following failure classes were defined in the present work: omission of messages, crash of processes, and some network failures such as link crash, message drop and partitioning. The implementation was developed in Java and the necessary modifications in Neko are registered in this text. To give support to the mechanisms for failure simulation, some changes were carried out in the source code of some classes of the framework, what means that the original version should be modified to use the proposed solutions. However, all legacy applications, developed for the original Neko version, keep whole compatibility and can be executed without being affected by the new changes. In this research, some case studies were used to test and validate the new failure classes. Finally, with the aim to facilitate the use of Neko, a document about the simulator, with information on how to install, to configure, the main available mechanisms and also on the developed support for failure simulation, was produced. Tolerancia : Falhas Sistemas distribuidos Simulação computacional Fault tolerance Neko Distributed systems Simulation

Search results