Global ETD Search

221	Motf : meta-objetos para tolerância a falhas / Moft-metaobjects for fault-tolerance Lisboa, Maria Lucia Blanck January 1995 (has links) As técnicas de programação e os mecanismos de linguagens de programação destinados ao desenvolvimento de aplicações de alta confiabilidade são agrupadas sob a denominação de tolerância a falhas em software. A área de tolerância a falhas abrange uma serie de técnicas com funcionalidades e aplicabilidade bem definidas, permitindo que seja considerada um domínio próprio - o domínio de tolerância a falhas. O conteúdo de informação desse domínio não é auto-suficiente, uma vez que atua sobre outros domínios. Seu objetivo é garantir as funcionalidades das aplicações desenvolvidas em outros domínios. Ao conjugar o domínio de tolerância a falhas a um outro domínio, ou seja, ao domínio de uma aplicação, o primeiro passa a se responsável pelos requisitos não-funcionais da aplicação. Os requisitos não funcionais de uma aplicação, a exemplo de confiabilidade e segurança, são cruciais em muitas aplicações e exigem métodos e conhecimentos que são distintos do domínio da aplicação. O modelo de orientação a objetos incentiva o desenvolvimento de aplicações através da composição de objetos, cada qual com a sua estrutura e comportamento específicos. Cada particular composição de objetos forma um conjunto que deve observar um comportamento que atenda aos requisitos da aplicação, de forma confiável. Com o objetivo de aumentar a confiabilidade da aplicação e de minimizar o efeito de possíveis falhas do sistema, são propostos objetos tolerantes a falhas. Objetos tolerantes a falhas são objetos responsáveis por serviços críticos da aplicação e que possuem mecanismos que garantem a confiabilidade e disponibilidade destes serviços. Comportamentos tolerantes a falhas de objetos são obtidos por redundância de componentes, incluindo replicacão e diversidade. O gerenciamento da redundância é executado de forma independente do domínio da aplicação e exercido em um meta-nível, através de técnicas de reflexão computacional. A adoção de reflexão computacional no modelo de orientação a objetos permite organizar as atividades de tolerância a falhas sem interferir no aspecto estrutural dos objetos do domínio da aplicação. Os controles que devem ser exercidos pelos meta-objetos sobre os objetos da aplicação são realizados em um meta-nível, de forma a separar as funcionalidades especificas da aplicação daquelas pertinentes ao domínio de tolerância a falhas. Estes meta-objetos, são organizados na forma de um framework, denominado MOTF - Meta-objetos para Tolerância a Falhas. O projeto de MOTF é um framework que apóia o desenvolvimento de aplicações tolerantes a falhas, compreendendo múltiplas classes que definem as funcionalidades exigidas por diversas técnicas de tolerância a falhas. Adota uma arquitetura reflexiva, na qual o meta-nível é dedicado as atividades de detecção e recuperação de erros através da monitoração de objetos da aplicação, localizados no nível base. Características de tolerância a falhas podem ser adicionadas a objetos considerados críticos pela aplicação, assim distribuindo, e não centralizando, a propriedade de tolerar falhas entre objetos da aplicação. Incorporando os princípios de reflexão computacional ao modelo de orientação a objetos dois benefícios principais se salientam: promover a reutilização de objetos tolerantes a falhas e garantir a invulnerabilidade do objeto do domínio da aplicação, ao separar as ações pertinentes ao domínio da aplicação das específicas do sistema tolerante a falhas. / Software fault-tolerance encompasses all techniques and programming languages' mechanisms intended to support the development of high reliability software. We can consider the faulttolerance area a proper domain of knowledge composed by well-defined techniques used to guarantee the reliability of applications related to other domains. Therefore, the fault-tolerance domain acts over other domains. When the fault-tolerance domain is merged into an application domain it becomes responsible for the non-functional requirements of the application. Among those requirements, reliability and safety are crucial ones and they use methods and concerns not related to the application domain. The object-oriented approach to software development allows a software to be decomposed into a set of components - the objects. Each object has its own structure and behavior. The view of a system as composed by interacting objects can be quite convenient in expressing different degrees of fault tolerance. We can distinguish between critical and non-critical objects and we may even distinguish between critical and non-critical operations within a single object. The objective of this research is the exploitation of object-oriented approach to increase reliability and decrease the effects of failures based on the provision of fault-tolerant objects. Fault-tolerant objects are abstractions of high reliability components and are designed to support several fault-tolerance strategies. Furthermore, computational reflection is adopted to organize faulttolerant activities at a meta-level and to provide transparent interfacing among fault-tolerant and non-fault-tolerant objects. A fault-tolerant object can be defined as an object that represents a single interface to redundant services and whose behavior is controlled by a metaobject. Possible behaviors of fault-tolerant objects include replication or diversity and the associated metaobject adds a specific fault-tolerant behavior to its referent object without interfering in its internal structure. MOTF - Metaobjects for Fault Tolerance is a framework intended to support the development of fault-tolerant applications. This framework consists of reusable meta-level classes. Each meta-level class implements a fault-tolerant service, and metaobjects are used as monitoring agents of fault-tolerant objects. The reflective object-oriented architecture promotes reusability and hides the programming of fault-tolerant mechanisms from the application. Programação Tolerancia : Falhas Orientacao : Objetos Software fault-tolerance Object-oriented Computational reflection
222	Sistema embarcado para a manutenção inteligente de atuadores elétricos / Embedded systems for intelligent maintenance of electrical actuators Bosa, Jefferson Luiz January 2009 (has links) O elevado custo de manutenção nos ambientes industriais motivou pesquisas de novas técnicas para melhorar as ações de reparos. Com a evolução tecnológica, principalmente da eletrônica, que proporcionou o uso de sistemas embarcados para melhorar as atividades de manutenção, estas agregaram inteligência e evoluíram para uma manutenção pró-ativa. Através de ferramentas de processamento de sinais, inteligência artificial e tolerância a falhas, surgiram novas abordagens para os sistemas de monitoramento a serviço da equipe de manutenção. Os ditos sistemas de manutenção inteligente, cuja tarefa é realizar testes em funcionamento (on-line) nos equipamentos industriais, promovem novos modelos de confiabilidade e disponibilidade. Tais sistemas são baseados nos conceitos de tolerância a falhas, e visam detectar, diagnosticar e predizer a ocorrência de falhas. Deste modo, fornece-se aos engenheiros de manutenção a informação antecipada do estado de comportamento do equipamento antes mesmo deste manifestar uma falha, reduzindo custos, aumentando a vida útil e tornando previsível o reparo. Para o desenvolvimento do sistema de manutenção inteligente objeto deste trabalho, foram estudadas técnicas de inteligência artificial (redes neurais artificiais), técnicas de projeto de sistemas embarcados e de prototipação em plataformas de hardware. No presente trabalho, a rede neural Mapas Auto-Organizáveis foi adotada como ferramenta base para detecção e diagnóstico de falhas. Esta foi prototipada numa plataforma de sistema embarcado baseada na tecnologia FPGA (Field Programmable Gate Array). Como estudo de caso, uma válvula elétrica utilizada em dutos para transporte de petróleo foi definida como aplicação alvo dos experimentos. Através de um modelo matemático, um conjunto de dados representativo do comportamento da válvula foi simulado e utilizado como entrada do sistema proposto. Estes dados visam o treinamento da rede neural e visam fornecer casos de teste para experimentação no sistema. Os experimentos executados em software validaram o uso da rede neural como técnica para detecção e diagnóstico de falhas em válvulas elétricas. Por fim, também realizou-se experimentos a fim de validar o projeto do sistema embarcado, comparando-se os resultado obtidos com este aos resultados obtidos a partir de testes em software. Os resultados revelam a escolha correta do uso da rede neural e o correto projeto do sistema embarcado para desempenhar as tarefas de detecção e diagnóstico de falhas em válvulas elétricas. / The high costs of maintenance in industrial environments have motivated research for new techniques to improve repair activities. The technological progress, especially in the electronics field, has provided for the use of embedded systems to improve repair, by adding intelligence to the system and turning the maintenance a proactive activity. Through tools like signal processing, artificial intelligence and fault-tolerance, new approaches to monitoring systems have emerged to serve the maintenance staff, leading to new models of reliability and availability. The main goal of these systems, also called intelligent maintenance systems, is to perform in-operation (on-line) test of industrial equipments. These systems are built based on fault-tolerance concepts, and used for the detection, the diagnosis and the prognosis of faults. They provide the maintenance engineers with information on the equipment behavior, prior to the occurrence of failures, reducing maintenance costs, increasing the system lifetime and making it possible to schedule repairing stops. To develop the intelligent maintenance system addressed in this dissertation, artificial intelligence (neural networks), embedded systems design and hardware prototyping techniques were studied. In this work, the neural network Self-Organizing Maps (SOM) was defined as the basic tool for the detection and the diagnosis of faults. The SOM was prototyped in an embedded system platform based on the FPGA technology (Field Programmable Gate Array). As a case study, the experiments were performed on an electric valve used in a pipe network for oil transportation. Through a mathematical model, a data set representative of the valve behavior was obtained and used as input to the proposed maintenance system. These data were used for neural network training and also provided test cases for system monitoring. The experiments were performed in software to validate the chosen neural network as the technique for the detection and diagnosis of faults in the electrical valve. Finally, experiments to validate the embedded system design were also performed, so as to compare the obtained results to those resulting from the software tests. The results show the correct choice of the neural network and the correct embedded systems design to perform the activities for the detection and diagnosis of faults in the electrical valve. Microeletrônica Sistemas embarcados Tolerancia : Falhas On-line test Fault-tolerance Self-organizing maps Embedded systems FPGA Maintenance
223	Designing single event upset mitigation techniques for large SRAM-Based FPGA components / Desenvolvimento de técnicas de tolerância a falhas transientes em componentes programáveis por SRAM Kastensmidt, Fernanda Gusmão de Lima January 2003 (has links) Esse trabalho consiste no estudo e desenvolvimento de técnicas de proteção a falhas transientes, também chamadas single event upset (SEU), em circuitos programáveis customizáveis por células SRAM. Os projetistas de circuitos eletrônicos estão cada vez mais predispostos a utilizar circuitos programáveis, conhecidos como Field Programmable Gate Array (FPGA), para aplicações espaciais devido a sua alta flexibilidade lógica, alto desempenho, baixo custo no desenvolvimento, rapidez na prototipação e principalmente pela reconfigurabilidade. Em particular, FPGAs customizados por SRAM são muito importantes para missões espaciais pois podem ser rapidamente reprogramados à distância quantas vezes for necessário. A técnica de proteção baseada em redundância tripla, conhecida como TMR, é comumente utilizada em circuitos integrados de aplicações específicas e pode também ser aplicada em circuitos programáveis como FPGAs. A técnica TMR foi testada no FPGA Virtex® da Xilinx em aplicações como contadores e micro-controladores. Falhas foram injetadas em todos as partes sensíveis da arquitetura e seus efeitos foram detalhadamente analisados. Os resultados de injeção de falhas e dos experimentos sob radiação em laboratório comprovaram a eficácia do TMR em proteger circuitos sintetizados em FPGAs customizados por SRAM. Todavia, essa técnica possui algumas limitações como aumento em área, uso de três vezes mais pinos de entrada e saída (E/S) e conseqüentemente, aumento na dissipação de potência. Com o objetivo de reduzir custos no TMR e melhorar a confiabilidade, uma técnica inovadora de tolerância a falhas para FPGAs customizados por SRAM foi desenvolvida para ser implementada em alto nível, sem modificações na arquitetura do componente. Essa técnica combina redundância espacial e temporal para reduzir custos e assegurar confiabilidade. Ela é baseada em duplicação com um circuito comparador e um bloco de detecção concorrente de falhas. Esta nova técnica proposta neste trabalho foi especificamente projetada para tratar o efeito de falhas transientes em blocos combinacionais e seqüenciais na arquitetura reconfigurável, reduzir o uso de pinos de E/S, área e dissipação de potência. A metodologia foi validada por injeção de falhas emuladas em uma placa de prototipação. O trabalho mostra uma comparação nos resultados de cobertura de falhas, área e desempenho entre as técnicas apresentadas. / This thesis presents the study and development of fault-tolerant techniques for programmable architectures, the well-known Field Programmable Gate Arrays (FPGAs), customizable by SRAM. FPGAs are becoming more valuable for space applications because of the high density, high performance, reduced development cost and re-programmability. In particular, SRAM-based FPGAs are very valuable for remote missions because of the possibility of being reprogrammed by the user as many times as necessary in a very short period. SRAM-based FPGA and micro-controllers represent a wide range of components in space applications, and as a result will be the focus of this work, more specifically the Virtex® family from Xilinx and the architecture of the 8051 micro-controller from Intel. The Triple Modular Redundancy (TMR) with voters is a common high-level technique to protect ASICs against single event upset (SEU) and it can also be applied to FPGAs. The TMR technique was first tested in the Virtex® FPGA architecture by using a small design based on counters. Faults were injected in all sensitive parts of the FPGA and a detailed analysis of the effect of a fault in a TMR design synthesized in the Virtex® platform was performed. Results from fault injection and from a radiation ground test facility showed the efficiency of the TMR for the related case study circuit. Although TMR has showed a high reliability, this technique presents some limitations, such as area overhead, three times more input and output pins and, consequently, a significant increase in power dissipation. Aiming to reduce TMR costs and improve reliability, an innovative high-level technique for designing fault-tolerant systems in SRAM-based FPGAs was developed, without modification in the FPGA architecture. This technique combines time and hardware redundancy to reduce overhead and to ensure reliability. It is based on duplication with comparison and concurrent error detection. The new technique proposed in this work was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. The thesis presents comparison results in fault coverage, area and performance between the discussed techniques. Microeletrônica Fpga Tolerancia : Falhas Fault tolerance Single event upset Fault injection Time and hardware redundancy
224	Uma estratégia baseada em programação orientada a aspectos para injeção de falhas de comunicação / A fault injection communication tool based on aspect oriented programming Silveira, Karina Kohl January 2005 (has links) A injeção de falhas permite acelerar a ocorrência de erros em um sistema para que seja possível a validação de seu comportamento sob falhas, assim como a avaliação do impacto dos mecanismos de detecção e remoção de erros no desempenho do sistema. Abordagens que facilitem o desenvolvimento de injetores vêm sendo buscadas com empenho, variando desde a inserção de injetores no kernel do sistema operacional até o uso de reflexão computacional para aplicações orientadas a objetos. Este trabalho explora os recursos da Programação Orientada a Aspectos como estratégia para a criação de ferramentas de injeção de falhas. A Programação Orientada a Aspectos tem como objetivo a modularização de interesses transversais, isto é, interesses que atravessam as unidades naturais de modularização. A injeção de falhas possui um comportamento que abrange os diversos módulos da aplicação alvo, afetando métodos que são executados em diversas classes em diversos pontos da aplicação. Desta forma, a injeção de falhas pode ser encapsulada sob a forma de aspectos. Para demonstrar a validade da proposta apresentada foi desenvolvida a ferramenta FICTA – Fault Injection Communication Tool based on Aspects. O objetivo é a validação de aplicações Java distribuídas, construídas sobre o protocolo UDP e que implementem mecanismos de tolerância a falhas em protocolos de camadas superiores. A importância de instrumentar um protocolo de base é justificada pelo fato da necessidade de validar aplicações, toolkits e middlewares que implementem tolerância a falhas em camadas superiores, logo, esses protocolos devem lidar corretamente com as falhas de mais baixo nível. A ferramenta abrange falha de colapso e omissão de mensagens do protocolo UDP. O uso de Programação Orientada a Aspectos na construção de FICTA resultou em uma ferramenta altamente modular, reusável e flexível, que pode ser facilmente inserida e removida da aplicação alvo, sem causar intrusividade espacial no código fonte da aplicação. / The fault injection allows us to accelerate the occurrence of failures in a system so that it is possible to validate its behavior under faults, as well as the evaluation of the impact on the mechanisms of detection and removal of failures in the performance of the system. The approaches that may facilitate the development of injectors have been searched with effort, varying from the insertion of injectors in the kernel of the operational system up to the computational reflection for object oriented applications. This work explores the resources of the Aspect Oriented Programming as a strategy to create tools of fault injection. The Aspect Oriented Programming has as its goal the modularization of the crosscutting concerns, that is to say the interests that cross the natural units of modularization. The fault injection has a behavior that covers the various modules of the target application, affecting methods that are executed in several classes of several areas of the application. Thus, the Fault Injection may be encapsulated under the form of aspects. To demonstrate the worthiness of the presented proposal, a tool called FICTA - Fault Injection Communication Tool based on Aspects, has been developed. The aim is to validate Java distributed applications built under the UDP protocol so that the fault tolerance mechanisms can be implemented in upper layers. The importance of instrumentate a protocol of base is justified by the necessity of validating applications, toolkits and middlewares that implement fault tolerance in upper layers, then, these protocols must deal correctly with the lower level faults. The tool covers crash and message omission faults of the UDP protocol. The use of Aspect Oriented Programming in the construction of FICTA resulted in a tool highly modular, reusable and flexible that may be easily inserted and removed from the target application, without causing spatial intrusiveness in the source code of the application. Tolerancia : Falhas Arquivos distribuidos Fault tolerance Fault injection Aspect oriented programming
225	Vers des protocoles de tolérance aux fautes Byzantines efficaces et robustes / Efficient and Robust Byzantine Fault Tolerant Replication Protocols Aublin, Pierre-Louis 21 January 2014 (has links) Les systèmes d'information deviennent de plus en plus complexes et il est difficile de les garantir exempts de fautes. La réplication de machines à états est une technique permettant de tolérer les fautes, quelque soit leur nature, qu'elles soient logicielles ou matérielles. Cette thèse traite des protocoles de réplication de machines à états tolérant les fautes arbitraires, également appelées Byzantines. Ces protocoles doivent relever deux défis : (i) ils doivent être efficaces, c'est-à-dire que leurs performances doivent être les meilleurs possibles, afin de masquer le coût supplémentaire dû à la réplication et (ii) ils doivent être robustes, c'est-à-dire qu'une attaque ne doit pas faire baisser leurs performances de manière importante. Dans cette thèse nous observons qu'aucun protocole ne relève ces deux défis en même temps : les protocoles que nous connaissons aujourd'hui sont soit conçus pour être efficaces au détriment de leur robustesse, soit conçus pour être robustes au détriment de leurs performances. Une première contribution de cette thèse est la conception d'un nouveau protocole qui réunit le meilleur des deux mondes. Ce protocole, R-Aliph, combine un protocole efficace mais peu robuste avec un protocole robuste afin de fournir un protocole à la fois efficace et robuste. Nous évaluons ce protocole de manière expérimentale et montrons que ses performances en cas d'attaque sont égales aux performances du protocole robuste sous-jacent. De plus, ses performances dans le cas sans faute sont très proches des performances du protocole connu le plus efficace : la différence maximale de débit est inférieure à 6%. Dans la seconde partie de cette thèse nous observons que les protocoles conçus pour être robustes sont peu robustes en réalité. En effet, il est possible de concevoir une attaque dans laquelle leur perte de débit est supérieure à 78%. Nous identifions le problème de ces protocoles et nous concevons un nouveau protocole plus robuste que les précédents : RBFT. L'idée de base de ce protocole est d'exécuter en parallèle plusieurs instances d'un même protocole. Les performances de ces différentes instances sont surveillées de près afin de détecter tout comportement malicieux. Nous évaluons RBFT dans le cas sans faute et en cas d'attaque. Nous montrons que ses performances dans le cas sans faute sont comparables aux performances des protocoles considérés comme robustes. De plus, nous observons que la dégradation maximale de performance qu'un attaquant peut causer sur le système est inférieure à 3%, même dans le cas de la pire attaque possible. / Information systems become more and more complex and it is difficult to guarantee that they are bug-free. State Machine Replication is a technique for tolerating faults, regardless their nature, whether they are software or hardware faults. This thesis studies Fault Tolerant State Machine Replication protocols that tolerate arbitrary, also called Byzantine, faults. These protocols face two challenges: (i) they must be efficient, i.e., their performance have to be the best ones, in order to mask the cost of the replication and (ii) they must be robust, i.e., an attack should not cause an important performance degradation. In this thesis, we observe that no protocol addresses both of these challenges: current protocols are either designed to be efficient but fail to be robust, or designed to be robust but exhibit poor performance. A first contribution of this thesis is the design of a new protocol which achieves the best of both worlds. This protocol, R-Aliph, combines an efficient but not robust protocol with a protocol designed to be robust. The result is a protocol that is both robust and efficient. We evaluate this protocol experimentally and show that its performance under attack equals the performance of the underlying robust protocol. Moreover, its performance in the fault-free case is close to the performance of the best known efficient protocol: the maximal throughput difference is less than 6%. In the second part of this thesis we analyze the state-of-the-art robust protocols and demonstrate that they are not effectively robust. Indeed, one can run an attack on each of these protocols such that the throughput loss is at least equal to 78%. We identify the problem of these protocols and design a new, effectively robust, protocol called RBFT. The main idea of this protocol is to execute several instances of a robust protocol in parallel and closely monitor their performance, in order to detect a malicious behaviour. We evaluate RBFT in the fault-free case and under attack. We observe that its performance in the fault-free case is equivalent to the performance of the other so-called robust BFT protocols. Moreover, we show that the maximal throughput degradation, under the worst possible attack, is less than 3%. Tolérance aux fautes Système distribué Performance Robustesse Fault tolerance Distributed system Performance Robustness 004
226	Estabilização de um sistema com histerese e sujeito a falhas aleatorias Huamaccto, Elmer Lévano 24 May 2014 (has links) Este trabalho apresenta condições suficientes para garantir a estabilidade em probabilidade para um sistema com histereses, modelado pelas equações de Bouc-Wen, mediante um controlador proporcional integral sujeito a falhas aleatórias. Quando ocorre uma falha de forma aleatória na linha transmissão, o sistema desliga o controlador e fica assim por um tempo. Após esse tempo, o sistema liga novamente o controle e permanece ativo até a próxima falha que ocorre de forma aleatória. As falhas ocorrem de acordo com o processo de distribuição de Poisson. Uma aplicação real considerando o controle de velocidade de um motor DC é apresentado. / This note presents conditions to assure the stability in probability for a hysteresis Bouc-Wen model controlled by a proportional-integral controller subject to random failures in the transmission line. When a failure happens, the controller turns off and remains off for a while. After that, the controller turns on and keeps working until the occurrence of the next failure. The failures occur according to a Poisson distributed process. A numerical example illustrates the result. A real application considering the speed control of a DC motor is presented. Processo estocástico Histerese Controle automático Tolerância a falha (Engenharia) Stochastic processes Hysteresis Automatic control Fault tolerance (Engineering)
227	Estabilização de um sistema com histerese e sujeito a falhas aleatorias Huamaccto, Elmer Lévano 24 May 2014 (has links) Este trabalho apresenta condições suficientes para garantir a estabilidade em probabilidade para um sistema com histereses, modelado pelas equações de Bouc-Wen, mediante um controlador proporcional integral sujeito a falhas aleatórias. Quando ocorre uma falha de forma aleatória na linha transmissão, o sistema desliga o controlador e fica assim por um tempo. Após esse tempo, o sistema liga novamente o controle e permanece ativo até a próxima falha que ocorre de forma aleatória. As falhas ocorrem de acordo com o processo de distribuição de Poisson. Uma aplicação real considerando o controle de velocidade de um motor DC é apresentado. / This note presents conditions to assure the stability in probability for a hysteresis Bouc-Wen model controlled by a proportional-integral controller subject to random failures in the transmission line. When a failure happens, the controller turns off and remains off for a while. After that, the controller turns on and keeps working until the occurrence of the next failure. The failures occur according to a Poisson distributed process. A numerical example illustrates the result. A real application considering the speed control of a DC motor is presented. Processo estocástico Histerese Controle automático Tolerância a falha (Engenharia) Stochastic processes Hysteresis Automatic control Fault tolerance (Engineering)
228	Reliability evaluation and error mitigation in pedestrian detection algorithms for embedded GPUs / Validação da confiabilidade e tolerância a falhas em algoritmos de detecção de pedestres para GPUs embarcadas Santos, Fernando Fernandes dos January 2017 (has links) A confiabilidade de algoritmos para detecção de pedestres é um problema fundamental para carros auto dirigíveis ou com auxílio de direção. Métodos que utilizam algoritmos de detecção de objetos como Histograma de Gradientes Orientados (HOG - Histogram of Oriented Gradients) ou Redes Neurais de Convolução (CNN – Convolutional Neural Network) são muito populares em aplicações automotivas. Unidades de Processamento Gráfico (GPU – Graphics Processing Unit) são exploradas para executar detecção de objetos de uma maneira eficiente. Infelizmente, as arquiteturas das atuais GPUs tem se mostrado particularmente vulneráveis a erros induzidos por radiação. Este trabalho apresenta uma validação e um estudo analítico sobre a confiabilidade de duas classes de algoritmos de detecção de objetos, HOG e CNN. Esta pesquisa almeja não somente quantificar, mas também qualificar os erros produzidos por radiação em aplicações de detecção de objetos em GPUs embarcadas. Os resultados experimentais com HOG foram obtidos usando duas arquiteturas de GPU embarcadas diferentes (Tegra e AMD APU), cada uma foi exposta por aproximadamente 100 horas em um feixe de nêutrons em Los Alamos National Lab (LANL). As métricas Precision e Recall foram usadas para validar a criticalidade do erro. Uma análise final mostrou que por um lado HOG é intrinsecamente resiliente a falhas (65% a 85% dos erros na saída tiveram um pequeno impacto na detecção), do outro lado alguns erros críticos aconteceram, tais que poderiam resultar em pedestres não detectados ou paradas desnecessárias do veículo. Este trabalho também avaliou a confiabilidade de duas Redes Neurais de Convolução para detecção de Objetos:Darknet e Faster RCNN. Três arquiteturas diferentes de GPUs foram expostas em um feixe de nêutrons controlado (Kepler, Maxwell, e Pascal), com as redes detectando objetos em dois data sets, Caltech e Visual Object Classes. Através da análise das saídas corrompidas das redes neurais, foi possível distinguir entre erros toleráveis e erros críticos, ou seja, erros que poderiam impactar na detecção de objetos. Adicionalmente, extensivas injeções de falhas no nível da aplicação (GDB) e em nível arquitetural (SASSIFI) foram feitas, para identificar partes críticas do código para o HOG e as CNNs. Os resultados mostraram que não são todos os estágios da detecção de objetos que são críticos para a confiabilidade da detecção final. Graças a injeção de falhas foi possível identificar partes do HOG e da Darknet, que se protegidas, irão com uma maior probabilidade aumentar a sua confiabilidade, sem adicionar um overhead desnecessário. A estratégia de tolerância a falhas proposta para o HOG foi capaz de detectar até 70% dos erros com 12% de overhead de tempo. / Pedestrian detection reliability is a fundamental problem for autonomous or aided driving. Methods that use object detection algorithms such as Histogram of Oriented Gradients (HOG) or Convolutional Neural Networks (CNN) are today very popular in automotive applications. Embedded Graphics Processing Units (GPUs) are exploited to make object detection in a very efficient manner. Unfortunately, GPUs architecture has been shown to be particularly vulnerable to radiation-induced failures. This work presents an experimental evaluation and analytical study of the reliability of two types of object detection algorithms: HOG and CNNs. This research aim is not just to quantify but also to qualify the radiation-induced errors on object detection applications executed in embedded GPUs. HOG experimental results were obtained using two different architectures of embedded GPUs (Tegra and AMD APU), each exposed for about 100 hours to a controlled neutron beam at Los Alamos National Lab (LANL). Precision and Recall metrics are considered to evaluate the error criticality. The reported analysis shows that, while being intrinsically resilient (65% to 85% of output errors only slightly impact detection), HOG experienced some particularly critical errors that could result in undetected pedestrians or unnecessary vehicle stops. This works also evaluates the reliability of two Convolutional Neural Networks for object detection: You Only Look Once (YOLO) and Faster RCNN. Three different GPU architectures were exposed to controlled neutron beams (Kepler, Maxwell, and Pascal) detecting objects in both Caltech and Visual Object Classes data sets. By analyzing the neural network corrupted output, it is possible to distinguish between tolerable errors and critical errors, i.e., errors that could impact detection. Additionally, extensive GDB-level and architectural-level fault-injection campaigns were performed to identify HOG and YOLO critical procedures. Results show that not all stages of object detection algorithms are critical to the final classification reliability. Thanks to the fault injection analysis it is possible to identify HOG and Darknet portions that, if hardened, are more likely to increase reliability without introducing unnecessary overhead. The proposed HOG hardening strategy is able to detect up to 70% of errors with a 12% execution time overhead. Sistemas embarcados Tolerancia : Falhas : Software Pedestre Fault tolerance Soft error Pedestrian detection Hardening
229	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM
230	Investigating techniques to reduce soft error rate under single-event-induced charge sharing / Investigando técnicas para reduzir a taxa de erro de soft sob evento único induzido de carga compartilhada Almeida, Antonio Felipe Costa de January 2014 (has links) The interaction of radiation with integrated circuits can provoke transient faults due to the deposit of charge in sensitive nodes of transistors. Because of the decrease the size in the process technology, charge sharing between transistors placed close to each other has been more and more observed. This phenomenon can lead to multiple transient faults. Therefore, it is important to analyze the effect of multiple transient faults in integrated circuits and investigate mitigation techniques able to cope with multiple faults. This work investigates the effect known as single-event-induced charge sharing in integrated circuits. Two main techniques are analyzed to cope with this effect. First, a placement constraint methodology is proposed. This technique uses placement constraints in standard cell based circuits. The objective is to achieve a layout for which the Soft-Error Rate (SER) due charge shared at adjacent cell is reduced. A set of fault injection was performed and the results show that the SER can be minimized due to single-event-induced charge sharing in according to the layout structure. Results show that by using placement constraint, it is possible to reduce the error rate from 12.85% to 10.63% due double faults. Second, Triple Modular Redundancy (TMR) schemes with different levels of granularities limited by majority voters are analyzed under multiple faults. The TMR versions are implemented using a standard design flow based on a traditional commercial standard cell library. An extensive fault injection campaign is then performed in order to verify the softerror rate due to single-event-induced charge sharing in multiple nodes. Results show that the proposed methodology becomes crucial to find the best trade-off in area, performance and soft-error rate when TMR designs are considered under multiple upsets. Results have been evaluated in a case-study circuit Advanced Encryption Standard (AES), synthesized to 90nm Application Specific Integrated Circuit (ASIC) library, and they show that combining the two techniques, the error rate resulted from multiple faults can be minimized or masked. By using TMR with different granularities and placement constraint methodology, it is possible to reduce the error rate from 11.06% to 0.00% for double faults. A detailed study of triple, four and five multiple faults combining both techniques are also described. We also tested the TMR with different granularities in SRAM-based FPGA platform. Results show that the versions with a fine grain scheme (FGTMR) were more effectiveness in masking multiple faults, similarly to results observed in the ASICs. In summary, the main contribution of this master thesis is the investigation of charge sharing effects in ASICs and the use of a combination of techniques based on TMR redundancy and placement to improve the tolerance under multiple faults. Microeletrônica Tolerancia : Falhas Fault tolerance Placement Constraining Single-Event-Induced Charge Sharing Triple Modular Redundancy

Search results