Global ETD Search

241	Design, Analysis and Resource Allocations in Networks In Presence of Region-Based Faults January 2013 (has links) abstract: Communication networks, both wired and wireless, are expected to have a certain level of fault-tolerance capability.These networks are also expected to ensure a graceful degradation in performance when some of the network components fail. Traditional studies on fault tolerance in communication networks, for the most part, make no assumptions regarding the location of node/link faults, i.e., the faulty nodes and links may be close to each other or far from each other. However, in many real life scenarios, there exists a strong spatial correlation among the faulty nodes and links. Such failures are often encountered in disaster situations, e.g., natural calamities or enemy attacks. In presence of such region-based faults, many of traditional network analysis and fault-tolerant metrics, that are valid under non-spatially correlated faults, are no longer applicable. To this effect, the main thrust of this research is design and analysis of robust networks in presence of such region-based faults. One important finding of this research is that if some prior knowledge is available on the maximum size of the region that might be affected due to a region-based fault, this piece of knowledge can be effectively utilized for resource efficient design of networks. It has been shown in this dissertation that in some scenarios, effective utilization of this knowledge may result in substantial saving is transmission power in wireless networks. In this dissertation, the impact of region-based faults on the connectivity of wireless networks has been studied and a new metric, region-based connectivity, is proposed to measure the fault-tolerance capability of a network. In addition, novel metrics, such as the region-based component decomposition number(RBCDN) and region-based largest component size(RBLCS) have been proposed to capture the network state, when a region-based fault disconnects the network. Finally, this dissertation presents efficient resource allocation techniques that ensure tolerance against region-based faults, in distributed file storage networks and data center networks. / Dissertation/Thesis / Ph.D. Computer Science 2013 Computer science Analysis Design Fault-tolerance Networks Region-based faults Resource allocation
242	A tool for automatic formal analysis of fault tolerance Nilsson, Markus January 2005 (has links) The use of computer-based systems is rapidly increasing and such systems can now be found in a wide range of applications, including safety-critical applications such as cars and aircrafts. To make the development of such systems more efficient, there is a need for tools for automatic safety analysis, such as analysis of fault tolerance. In this thesis, a tool for automatic formal analysis of fault tolerance was developed. The tool is built on top of the existing development environment for the synchronous language Esterel, and provides an output that can be visualised in the Item toolkit for fault tree analysis (FTA). The development of the tool demonstrates how fault tolerance analysis based on formal verification can be automated. The generated output from the fault tolerance analysis can be represented as a fault tree that is familiar to engineers from the traditional FTA analysis. The work also demonstrates that interesting attributes of the relationship between a critical fault combination and the input signals can be generated automatically. Two case studies were used to test and demonstrate the functionality of the developed tool. A fault tolerance analysis was performed on a hydraulic leakage detection system, which is a real industrial system, but also on a synthetic system, which was modeled for this purpose. Dependability Fault Tolerance Esterel Formal Verification System Safety Computer Sciences Datavetenskap (datalogi)
243	Desenvolvimento de um sistema microcomputador tolerante a falha com arquitetura em anel / Development of a fault tolerant microcomputer system with ring architecture Deisy Piedade Munhoz Fischer 26 October 1990 (has links) Neste trabalho é apresentado um sistema microcomputador tolerante a falhas, com redundância modular tripla (TMR). Este sistema é caracterizado por uma Arquitetura em Anel implementada com três módulos processadores. A estrutura em Anel é uma arquitetura onde os módulos adjacentes são conectados por um canal de comunicação, formando um laço. Os módulos recebem dados de uma ou mais fontes (dependendo se as fontes são replicadas ou não). Esta informação é então processada e um dado é preparado para votação. O dado é transmitido aos módulos adjacentes, através do canal de comunicação. A tolerância à falhas é obtida, pela capacidade que os três processadores têm de examinar os resultados do processamento de seus vzinhos. Assim, cada processador recebe duas versões de cada processamento: o seu próprio resultado e o resultado do seu vizinho. Cada módulo, então executa a votação por programação, através da estratégia de votação sobre um número parcial de dados. Se nenhuma falha ocorreu, os três módulos irão produzir o mesmo resultado. O resultado da votação (comparação) é indicado em cada módulo por um sinalizador de erro. Quando ocorre uma falha em um módulo, o esquema de votação por programação identifica a ocorrência desta falha, mas o sistema irá continuar a operar corretamente, apesar da falha e um módulo. O sistema em Anel com redundância tripla, pode tolerar uma falha em um dos módulos. Estes cálculos não são executados de uma maneira fortemente sincronizada, mas os processadores são sincronizados de uma forma mais flexível, através de programação. O sistema foi implementado usando três módulos microcomputadores. Cada microcomputador tem um controlador de disco. O sistema acessa um único terminal de vídeo. O programa monitor é composto de três módulos idênticos, para os três microcomputadores. Cada módulo reside na memória local de cada microcomputador. O sistema executa o Sistema Operacional CP/M. Os programas para este sistema operacional serão executados de uma forma tolerante à falhas sem necessidade de modificações. O objetivo deste trabalho foi desenvolver um sistema de uso geral com alta disponibilidade. / A fault-tolerant, tri-module redundant (TMR), microcomputer System is presented. This system is characterized by a Ring Architecture implemented with three processor modules. The ring structure is a loop type architecture in which adjacent modules are connected by single communication links. The modules receive data from one or more sources (depending on whether these sources are replicated or not). This information is then processed and made ready for voting. The data is passed between the adjacent modules over the connecting links. Fault-tolerance is achieved by each of the three processors being able to examine computational results from its neighbour. Thus, each module process receives two versions of each calculation: one from its own calculation and one received from the other processor. Each module then performs the voting by software, with voting on parcial data estrategy. If no fault has ocorred, it can be expected that all the three modules will produce the same result. The result of the voting (comparision) is indicated in each module by na error condition flag. In the evento f a fault in one of the module/processors, then this will be recognized by the software voting and an error will be reported, the system will continue proper operation in spite of the failure of a single module. Triple Modular Redundant Ring System can tolerate a single fault in one of the modules. The calculations are not carried out in a tightly synchronized manner, but the processors are loosely synchronized by software. The system was implemented using three Z-80 based microcomputer boards. Each microcomputer board has it own disk-controller board. The system access a single vídeo terminal. The software monitor is comprised of three identical modules, one for each three microcomputer. Each software monitor module resides in the respective local memory of its microcomputers. The application software performs under CP/M Operational System. Programs from non-redundant versions will be executed in a fault tolerant manner without modification. Through this, our objective was to develop a system of general application, with high availability. Arquitetura em anel Microcomputador Tolerância a falhas Fault tolerance Microcomputer Ring architecture
244	Tratamento de exceções no desenvolvimento de sistemas tolerantes a falhas baseadas em componentes / Exception handling in the development of fault-tolerant component-based systems Lima Filho, Fernando Jose Castor de 30 November 2006 (has links) Orientador: Cecilia Mary Fischer Rubira / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-10T07:04:57Z (GMT). No. of bitstreams: 1 LimaFilho_FernandoJoseCastorde_D.pdf: 5063998 bytes, checksum: 8bfec9185fab14cb08c2a8b2ce7391a9 (MD5) Previous issue date: 2006 / Resumo: Mecanismos de tratamento de exceções foram concebidos com o intuito de facilitar o gerenciamento da complexidade de sistemas de software tolerantes a falhas. Eles promovem uma separação textual explícita entre o código normal e o código que lida com situações anormais, afim de dar suporte a construção de programas que são mais concisos fáceis de evoluir e confáveis. Diversas linguagens de programação modernas e a maioria dos modelos de componentes implementam mecanismos de tratamento de exceções. Apesar de seus muitos benefícios, tratamento de exceções pode ser a fonte de diversas falhas de projeto se usado de maneira indisciplinada. Estudos recentes mostram que desenvolvedores de sistemas de grande escala baseados em infra-estruturas de componentes têm hábitos, no tocante ao uso de tratamento de exceções, que tornam suas aplicações vulneráveis a falhas e difíceis de se manter. Componentes de software criam novos desafios com os quais mecanismos de tratamento de exceções tradicionais não lidam, o que aumenta a probabilidade de que problemas ocorram. Alguns exemplos são indisponibilidade de código fonte e incompatibilidades arquiteturais. Neste trabalho propomos duas técnicas complementares centradas em tratamento de exceções para a construção de sistemas tolerantes a falhas baseados em componentes. Ambas têm ênfase na estrutura do sistema como um meio para se reduzir o impacto de mecanismos de tolerância a falhas em sua complexidade total e o número de falhas de projeto decorrentes dessa complexidade. A primeira é uma abordagem para o projeto arquitetural dos mecanismos de recuperação de erros de um sistema. Ela trata do problema de verificar se uma arquitetura de software satisfaz certas propriedades relativas ao fluxo de exceções entre componentes arquiteturais, por exemplo, se todas as exceções lançadas no nível arquitetural são tratadas. A abordagem proposta lança de diversas ferramentas existentes para automatizar ao máximo esse processo. A segunda consiste em aplicar programação orientada a aspectos (AOP) afim de melhorar a modularização de código de tratamento de exceções. Conduzimos um estudo aprofundado com o objetivo de melhorar o entendimento geral sobre o efeitos de AOP no código de tratamento de exceções e identificar as situações onde seu uso é vantajoso e onde não é / Abstract: Exception handling mechanisms were conceived as a means to help managing the complexity of fault-tolerant software. They promote an explicit textual separation between normal code and the code that deals with abnormal situations, in order to support the construction of programs that are more concise, evolvable, and reliable. Several mainstream programming languages and most of the existing component models implement exception handling mechanisms. In spite of its many bene?ts, exception handling can be a source of many design faults if used in an ad hoc fashion. Recent studies show that developers of large-scale software systems based on component infrastructures have habits concerning the use of exception handling that make applications vulnerable to faults and hard to maintain. Software components introduce new challenges which are not addressed by traditional exception handling mechanisms and increase the chances of problems occurring. Examples include unavailability of source code and architectural mismatches. In this work, we propose two complementary techniques centered on exception handling for the construction of fault-tolerant component-based systems. Both of them emphasize system structure as a means to reduce the impactof fault tolerance mechanisms on the overall complexity of a software system and the number of design faults that stem from complexity. The ?rst one is an approach for the architectural design of a system?s error handling capabilities. It addresses the problem of verifying whether a software architecture satis?es certain properties of interest pertaining the ?ow of exceptions between architectural components, e.g., if all the exceptions signaled at the architectural level are eventually handled. The proposed approach is based on a set of existing tools that automate this process as much as possible. The second one consists in applying aspect-oriented programming (AOP) to better modularize exception handling code. We have conducted a through study aimed at improving our understanding of the efects of AOP on exception handling code and identifying the situations where its use is advantageous and the ones where it is not / Doutorado / Doutor em Ciência da Computação Tolerância à falha (Computação) Software - Arquitetura Componente de software Fault tolerance (Computing) Software components Software - Architecture
245	Uma estratégia baseada em programação orientada a aspectos para injeção de falhas de comunicação / A fault injection communication tool based on aspect oriented programming Silveira, Karina Kohl January 2005 (has links) A injeção de falhas permite acelerar a ocorrência de erros em um sistema para que seja possível a validação de seu comportamento sob falhas, assim como a avaliação do impacto dos mecanismos de detecção e remoção de erros no desempenho do sistema. Abordagens que facilitem o desenvolvimento de injetores vêm sendo buscadas com empenho, variando desde a inserção de injetores no kernel do sistema operacional até o uso de reflexão computacional para aplicações orientadas a objetos. Este trabalho explora os recursos da Programação Orientada a Aspectos como estratégia para a criação de ferramentas de injeção de falhas. A Programação Orientada a Aspectos tem como objetivo a modularização de interesses transversais, isto é, interesses que atravessam as unidades naturais de modularização. A injeção de falhas possui um comportamento que abrange os diversos módulos da aplicação alvo, afetando métodos que são executados em diversas classes em diversos pontos da aplicação. Desta forma, a injeção de falhas pode ser encapsulada sob a forma de aspectos. Para demonstrar a validade da proposta apresentada foi desenvolvida a ferramenta FICTA – Fault Injection Communication Tool based on Aspects. O objetivo é a validação de aplicações Java distribuídas, construídas sobre o protocolo UDP e que implementem mecanismos de tolerância a falhas em protocolos de camadas superiores. A importância de instrumentar um protocolo de base é justificada pelo fato da necessidade de validar aplicações, toolkits e middlewares que implementem tolerância a falhas em camadas superiores, logo, esses protocolos devem lidar corretamente com as falhas de mais baixo nível. A ferramenta abrange falha de colapso e omissão de mensagens do protocolo UDP. O uso de Programação Orientada a Aspectos na construção de FICTA resultou em uma ferramenta altamente modular, reusável e flexível, que pode ser facilmente inserida e removida da aplicação alvo, sem causar intrusividade espacial no código fonte da aplicação. / The fault injection allows us to accelerate the occurrence of failures in a system so that it is possible to validate its behavior under faults, as well as the evaluation of the impact on the mechanisms of detection and removal of failures in the performance of the system. The approaches that may facilitate the development of injectors have been searched with effort, varying from the insertion of injectors in the kernel of the operational system up to the computational reflection for object oriented applications. This work explores the resources of the Aspect Oriented Programming as a strategy to create tools of fault injection. The Aspect Oriented Programming has as its goal the modularization of the crosscutting concerns, that is to say the interests that cross the natural units of modularization. The fault injection has a behavior that covers the various modules of the target application, affecting methods that are executed in several classes of several areas of the application. Thus, the Fault Injection may be encapsulated under the form of aspects. To demonstrate the worthiness of the presented proposal, a tool called FICTA - Fault Injection Communication Tool based on Aspects, has been developed. The aim is to validate Java distributed applications built under the UDP protocol so that the fault tolerance mechanisms can be implemented in upper layers. The importance of instrumentate a protocol of base is justified by the necessity of validating applications, toolkits and middlewares that implement fault tolerance in upper layers, then, these protocols must deal correctly with the lower level faults. The tool covers crash and message omission faults of the UDP protocol. The use of Aspect Oriented Programming in the construction of FICTA resulted in a tool highly modular, reusable and flexible that may be easily inserted and removed from the target application, without causing spatial intrusiveness in the source code of the application. Tolerancia : Falhas Arquivos distribuidos Fault tolerance Fault injection Aspect oriented programming
246	Detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip / Detection and protection of susceptible basic blocks through systematic bit-flip analysis Rodrigues, Diego Gonçalves January 2015 (has links) Partículas radioativas, ao atingirem o hardware dos sistemas computacionais, podem resultar em comportamentos inesperados durante a execução de um software. Tais comportamentos inesperados podem persistir por toda a vida útil do sistema ou podem ter uma duração limitada. Nesse último caso, temos o que chamamos de falhas transientes. Falhas transientes podem fazer com que as instruções do programa executem em uma sequência incorreta, o que chamamos de erros de fluxo de controle (Control-flow errors - CFEs). Estudos mostram que entre 33% e 77% das falhas transientes que afetam o hardware se manifestam como erros de fluxo de controle, dependendo do tipo do processador. Se o sistema não realizar nenhuma verificação em tempo de execução, um erro de fluxo de controle pode não ser detectado, o que pode resultar em uma execução incorreta do programa. Sistemas projetados para aplicações de baixo custo voltados para sistemas embarcados, onde os custos e desempenho são os fatores principais, utilizam técnicas baseadas em software para aumentar a confiabilidade do sistema. As técnicas baseadas em software para detecção de CFEs são conhecidas como signature monitoring ou signature checking. Essas técnicas introduzem código extra em todos os blocos básicos do programa com a finalidade de detectar os CFEs. Esse código extra implica em overhead, que pode ter uma grande variação dependendo da técnica utilizada. Na tentativa de minimizar o overhead imposto pelas técnicas de detecção de CFEs, neste trabalho foi desenvolvida a técnica de detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip. O objetivo da técnica é detectar os blocos básicos suscetíveis do programa através da análise sistemática de single bit-flip e proteger apenas esses blocos básicos. A técnica foi avaliada em termos de sua taxa de cobertura de falhas e desempenho. Para avaliar a taxa de cobertura falhas foram realizadas várias campanhas de injeção de falhas nos programas da suíte de benchmarks Mibench. A avaliação de desempenho foi feita com base na quantidade de instruções de máquina executadas pelos benchmarks, comparando quantidade de instruções antes e depois da utilização da técnica detecção e proteção de blocos básicos suscetíveis. Os resultados dos experimentos mostram que é possível reduzir em até 27,93% a quantidade de blocos básicos protegidos e ao mesmo tempo manter uma alta taxa de cobertura de falhas. Porém, em termos de desempenho, o ganho não ficou na mesma proporção da quantidade de blocos básicos não protegidos, ficando abaixo do esperado. / Radioactive particles hitting the hardware of computer systems may result in unexpected behavior during software execution. Such unexpected behavior may persist for the lifetime of the system or may have a limited duration. In the latter case, we have what is called a transient fault. Transient faults may cause the program instructions to execute in an incorrect sequence. This incorrect sequence is called a control flow error (CFE). Research show that between 33% to 77% of transient faults manifest themselves as CFEs, depending on the type of the processor. If the system does not perform any verification at runtime, a control flow error may be not detected, which can result in incorrect program execution. Systems projected to low-cost embedded applications, where cost and performance are the main factors, use software based techniques to improve system reliability. Software based techniques to detect CFEs are known as signature monitoring or signature checking. These techniques insert extra code in each basic block of the program in order to detect CFEs. This extra code add an undesirable overhead in the program, which can have large variation depending on the technique used. In the attempt to minimize the overhead added by CFEs detection techniques, this work developed a technique of detection and protection of susceptible basic blocks through systematic bit-flip analysis. The purpose of this technique is to detect the susceptible basic blocks of the program through the systematic bit-flip analysis and to protect only these basic blocks. The technique was evaluated based on its fault coverage rate and performance. To evaluate the fault coverage rate a fault injection campaing was performed in the programs of the Mibench benchmark suite. The performance evaluation was based in the number of instruction executed by each benchmark, comparing the number of instructions before and after the use of the proposed technique. The experimental results show that is possible to reduce up to 27,93% the amount of protected basic blocks, while keeping a high faults coverage hate. However, in terms of performance, the gain was not in the same proportion, being lower than expected. Sistemas embarcados Tolerancia : Falhas Embedded systems Transient faults Fault tolerance Signature checking
247	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures Pereira, Mônica Magalhães January 2012 (has links) As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place. Tolerancia : Falhas Microeletrônica Cmos Reconfigurable architectures Fault tolerance Reliability analysis Scaling
248	Avaliação de atraso, consumo e proteção de somadores tolerantes a falhas / Evaluating delay, power and protection of fault tolerant adders Franck, Helen de Souza January 2011 (has links) Nos últimos anos, os sistemas integrados em silício (SOCs - Systems-on-Chip) têm se tornado menos imunes a ruído, em decorrência dos ajustes necessários na tecnologia CMOS (Complementary Metal-Oxide-Silicon) para garantir o funcionamento dos transistores com dimensões nanométricas. Dentre tais ajustes, a redução da tensão de alimentação e da tensão de limiar (threshold) tornam os SOCs mais suscetíveis a falhas transientes, principalmente aquelas provocadas pela colisão de partículas energéticas que provêm do espaço e encontram-se presentes na atmosfera terrestre. Quando uma partícula energética de alta energia colide com o dreno de um transistor que está desligado, ela perde energia e produz pares elétron-lacuna livres, resultando em uma trilha de ionização. A ionização pode gerar um pulso transiente de tensão que pode ser interpretado como uma mudança no sinal lógico. Em um circuito combinacional, o pulso pode propagar-se até ser armazenado em um elemento de memória. Tal fenômeno é denominado Single-Event Transient (SET). Como a tendência é que as dimensões dos dispositivos fabricados com tecnologia CMOS continuem reduzindo por mais alguns anos, a ocorrência de SETs em SOCs operando na superfície terrestre tende a aumentar, exigindo a adoção de técnicas de tolerância a falhas no projeto de SOCs. O presente trabalho tem por objetivo avaliar circuitos somadores tolerantes a falhas transientes encontrados na literatura. Duas arquiteturas de somadores foram escolhidas: Ripple Carry Adder (RCA) e Binary Signed Digit Adder (BSDA). O RCA foi escolhido por ser o tipo de somador de menor custo e por isso, amplamente utilizado em SOCs. Já o BSDA foi escolhido porque utiliza o sistema numérico de dígito binário com sinal (Binary Signed Digit – BSD). Por ser um sistema de representação redundante, o uso de BSD facilita a aplicação de técnicas de tolerância a falhas baseadas em redundância de informação. Os somadores protegidos avaliados foram projetados com as seguintes técnicas: Redundância Modular Tripla (Triple Modular Redundancy - TMR) e Recomputação com Entradas e Saídas Invertidas (RESI), no caso do RCA, e codificação 1 de 3 e verificação de paridade, no caso do BSDA. As 9 arquiteturas de somadores foram simuladas no nível elétrico usando o Modelo Tecnológico Preditivo (Predictive Technology Model - PTM) de 45nm e considerando quatro comprimentos de operandos: 4, 8, 16 e 32 bits. Os resultados obtidos permitiram quantificar o número de transistores, o atraso crítico e a potência média consumida por cada arquitetura protegida. Também foram realizadas campanhas de injeção de falhas, por meio de simulações no nível elétrico, para estimar o grau de proteção de cada arquitetura. Os resultados obtidos servem para guiar os projetistas de SOCs na escolha da arquitetura de somador tolerante a falhas mais adequada aos requisitos de cada projeto. / In the past recent years, integrated systems on a chip (Systems-on-chip - SOCs) became less immune to noise due to the adjusts in CMOS technology needed to assure the operation of nanometric transistors. Among such adjusts, the reductions in supply voltage and threshold voltage make SOSs more susceptible to transient faults, mainly those provoked by the collision of charged particles coming from the outer space that are present in the atmosphere. When a heavily energy charged particle hits the drain region of a transistor that is at the off state it produces free electron-hole pairs, resulting in an ionizing track. The ionization may generate a transient voltage pulse that can be interpreted as a change in the logic signal. In a combinational circuit, the pulse may propagate up to the primary outputs and may be captured by the output storage element. Such phenomenon is referred to as Single-Event Transient (SET). Since it is expected that transistor dimensions will continue to reduce in the next technological nodes, the occurrence of SETs at Earth surface will increase and therefore, fault tolerance techniques will become a must in the design of SOSs. The present work targets the evaluation of transient fault-tolerant adders found in the literature. Two adder architectures were chosen: the Ripple-Carry Adder (RCA) and the Binary Signed Digit Adder (BSDA). The RCA was chosen because it is the least expensive and therefore, the most used architecture for SOS design. The BSDA, in turn, was chosen because it uses the Binary Signed Digit (BSD) system. As a redundant number system, the BSD paves the way to the implementation of fault-tolerant adders using information redundancy. The evaluated fault-tolerant adders were implemented by using the following techniques: Triple Module Redundancy (TMR) and Recomputing with Inverted Inputs and Outputs (RESI), in the case of the RCA, and 1 out of 3 coding and parity verification, in the case of the BSDA. A total of 9 adder architectures were simulated at the electric-level using the Predictive Technology Model (PTM) for 45nm in four different bitwidths: 4, 8, 16 and 32. The obtained results allowed for quantifying the number of transistors, critical delay and average power consumption for each fault-tolerant architecture. Fault injection campaigns were also accomplished by means of electric-level simulations to estimate the degree of protection of each architecture. The results obtained in the present work may be used to guide SOS designers in the choice of the fault-tolerant adder architecture that is most likely to satisfy the design requirements. Microeletrônica Tolerancia : Falhas Fault tolerance SET (single-event transient) Binary-signed digit number system Adder architectures
249	CFT-tool : ferramenta configurável para aplicação de técnicas de detecção de falhas em processadores por software / CFT-tool: configurable tool to application of faults detection techniques in processors by software Chielle, Eduardo January 2012 (has links) Este trabalho apresenta uma ferramenta configurável, denominada de CFT-tool, capaz de aplicar automaticamente técnicas de detecção de erros em software com o objetivo de proteger processadores com diferentes arquiteturas e organizações contra falhas transientes no hardware. As técnicas baseadas em redundância e comparação são aplicadas pela CFT-tool no código assembly de um programa desprotegido, compilado para a arquitetura alvo. A ferramenta desenvolvida foi validada utilizando dois processadores distintos: miniMIPS e LEON3. O processador miniMIPS foi utilizado para verificar a eficiência, em termos de taxa de detecção de erros, tempo de execução e ocupação de memória, das técnicas de detecção em software aplicadas pela CFT-tool, comparando os resultados obtidos com os presentes na literatura. O processador LEON3 foi selecionado por ser amplamente utilizado em aplicações espaciais e por ser baseado em uma arquitetura diferente da arquitetura do processador miniMIPS. Com o processador LEON3 é verificada a configurabilidade da CFT-tool, isto é, a capacidade dela de aplicar técnicas de detecção em software em um código compilado para um diferente processador, o mantendo funcional e sendo capaz de detectar erros. A CFT-tool pode ser utilizada para proteger programas para outras arquiteturas e organizações através da modificação dos arquivos de configuração da ferramenta. A configuração das técnicas é definida segundo as especificações da aplicação, recursos do processador e seleções do usuário. Programas foram protegidos e falhas foram injetadas em nível lógico em ambos os processadores. Para o processador miniMIPS, as taxas de detecção de erros, os tempos de execução e as ocupações de memórias dos programas protegidos se mostraram compatíveis com os resultados presentes na literatura. Resultados semelhantes foram encontrados para o processador LEON3. Diferenças entre os resultados ocorrem devido às características da arquitetura. A ferramenta CFT-tool por ser configurável pode proteger o código na integralidade ou selecionar partes do código e registradores que serão redundantes e protegidos. A vantagem de proteger parte do código é reduzir o custo final em termos de tempo de processamento e ocupação de memória. Uma análise do impacto da seleção seletiva de registradores na taxa de detecção de erros é apresentada. E diretivas de alcançar um comprometimento ótimo entre quantidade de registradores protegidos, taxa de detecção de erros e custo são discutidas. / This work presents a configurable tool, called CFT-tool, capable of automatically applying software-based error detection techniques aiming to protect processors with different architectures and organizations against transient faults in the hardware. The techniques are based on redundancy and comparison. They are applied by CFT-tool in the assembly code of an unprotected program, compiled to the target architecture. The developed tool was validated using two distinct processors: miniMIPS and LEON3. The miniMIPS processor has been utilized to verify the efficiency of the software-based techniques applied by CFT-tool in the assembly code of unprotected programs in terms of error detection rate, runtime and memory occupation, comparing the obtained results with those presented in the literature. The LEON3 processor was selected because it is largely adopted in space applications and because it is based on a different architecture that miniMIPS processor. The configurability of the CFT-tool is verified with the LEON3 processor, that is, the capability of the tool at applying software-based detection techniques in a code compiled to a different processor, maintaining it functional and capable of detecting errors. The CFT-tool can be utilized to protect programs compiled to other architectures and organizations by modifying the configuration files of the tool. The configuration of the techniques is defined by the specifications of the application, processor resources and selections of the user. Programs were protected and faults were injected in logical level in both processors. When using the miniMIPS processor, the error detection rates, runtimes and memory occupations of the protected programs are comparable to the results presents in the literature. Similar results are reached with the LEON3 processor. Differences between the results are due to architecture features. The CFT-tool can be configurable to protect the entire code or to select portions of the code or registers that will be redundant and protected. The advantage of protecting portions of the code is to reduce the final cost in terms of runtime and memory occupation. An analysis of the impact of selective selection of registers in the error detection rate is also presented. And policies to reach an optimum committal between amount of protected registers, error detection rate and cost are discussed. Microeletrônica Tolerancia : Falhas Fault tolerance Transient faults SEU SET Software-based detection techniques
250	Extensão do suporte para simulação de defeitos em algoritmos distribuídos utilizando o Neko / Extension to support failures in distributed algorithm simulation using Neko Rodrigues, Luiz Antonio January 2006 (has links) O estudo e desenvolvimento de sistemas distribuídos é uma tarefa que demanda grande esforço e recursos. Por este motivo, a pesquisa em sistemas deste tipo pode ser auxiliada com o uso de simuladores, bem como por meio da emulação. A vantagem de se usar simuladores é que eles permitem obter resultados bastante satisfatórios sem causar impactos indesejados no mundo real e, conseqüentemente, evitando desperdícios de recursos. Além disto, testes em larga escala podem ser controlados e reproduzidos. Neste sentido, vem sendo desenvolvido desde 2000 um framework para simulação de algoritmos distribuídos denominado Neko. Por meio deste framework, algoritmos podem ser simulados em uma única máquina ou executados em uma rede real utilizando-se o mesmo código nos dois casos. Entretanto, através de um estudo realizado sobre os modelos de defeitos mais utilizados na literatura, verificou-se que o Neko é ainda bastante restrito nesta área. A única classe de defeito abordada, lá referida como colapso, permite apenas o bloqueio temporário de mensagens do processo. Assim, foram definidos mecanismos para a simulação das seguintes classes de defeitos: omissão de mensagens, colapso de processo, e alguns defeitos de rede tais como quebra de enlace, perda de mensagens e particionamento. A implementação foi feita em Java e as alterações necessárias no Neko estão documentadas no texto. Para dar suporte aos mecanismos de simulação de defeitos, foram feitas alterações no código fonte de algumas classes do framework, o que exige que a versão original seja alterada para utilizar as soluções. No entanto, qualquer aplicação desenvolvida anteriormente para a versão original poderá ser executada normalmente independente das modificações efetuadas. Para testar e validar as propostas e soluções desenvolvidas foram utilizados estudos de caso. Por fim, para facilitar o uso do Neko foi gerado um documento contendo informações sobre instalação, configuração e principais mecanismos disponíveis no simulador, incluindo o suporte a simulação de defeitos desenvolvido neste trabalho. / The study and development of distributed systems is a task that demands great effort and resources. For this reason, the research in systems of this type can be assisted by the use of simulators, as well as by means of the emulation. The advantage of using simulators is that, in general, they allow to get acceptable results without causing harming impacts in the real world and, consequently, preventing wastefulness of resources. Moreover, tests on a large scale can be controlled and reproduced. In this way, since 2000, a framework for the simulation of distributed algorithms called Neko has been developed. By means of this framework, algorithms can be simulated in a single machine or executed in a real network, using the same code in both cases. However, studying the most known and used failure models developed having in mind distributed systems, we realized that the support offered by Neko for failure simulation was too restrictive. The only developed failure class, originally named crash, allowed only a temporary blocking of process’ messages. Thus, mechanisms for the simulation of the following failure classes were defined in the present work: omission of messages, crash of processes, and some network failures such as link crash, message drop and partitioning. The implementation was developed in Java and the necessary modifications in Neko are registered in this text. To give support to the mechanisms for failure simulation, some changes were carried out in the source code of some classes of the framework, what means that the original version should be modified to use the proposed solutions. However, all legacy applications, developed for the original Neko version, keep whole compatibility and can be executed without being affected by the new changes. In this research, some case studies were used to test and validate the new failure classes. Finally, with the aim to facilitate the use of Neko, a document about the simulator, with information on how to install, to configure, the main available mechanisms and also on the developed support for failure simulation, was produced. Tolerancia : Falhas Sistemas distribuidos Simulação computacional Fault tolerance Neko Distributed systems Simulation

Search results