Global ETD Search

201	Motf : meta-objetos para tolerância a falhas / Moft-metaobjects for fault-tolerance Lisboa, Maria Lucia Blanck January 1995 (has links) As técnicas de programação e os mecanismos de linguagens de programação destinados ao desenvolvimento de aplicações de alta confiabilidade são agrupadas sob a denominação de tolerância a falhas em software. A área de tolerância a falhas abrange uma serie de técnicas com funcionalidades e aplicabilidade bem definidas, permitindo que seja considerada um domínio próprio - o domínio de tolerância a falhas. O conteúdo de informação desse domínio não é auto-suficiente, uma vez que atua sobre outros domínios. Seu objetivo é garantir as funcionalidades das aplicações desenvolvidas em outros domínios. Ao conjugar o domínio de tolerância a falhas a um outro domínio, ou seja, ao domínio de uma aplicação, o primeiro passa a se responsável pelos requisitos não-funcionais da aplicação. Os requisitos não funcionais de uma aplicação, a exemplo de confiabilidade e segurança, são cruciais em muitas aplicações e exigem métodos e conhecimentos que são distintos do domínio da aplicação. O modelo de orientação a objetos incentiva o desenvolvimento de aplicações através da composição de objetos, cada qual com a sua estrutura e comportamento específicos. Cada particular composição de objetos forma um conjunto que deve observar um comportamento que atenda aos requisitos da aplicação, de forma confiável. Com o objetivo de aumentar a confiabilidade da aplicação e de minimizar o efeito de possíveis falhas do sistema, são propostos objetos tolerantes a falhas. Objetos tolerantes a falhas são objetos responsáveis por serviços críticos da aplicação e que possuem mecanismos que garantem a confiabilidade e disponibilidade destes serviços. Comportamentos tolerantes a falhas de objetos são obtidos por redundância de componentes, incluindo replicacão e diversidade. O gerenciamento da redundância é executado de forma independente do domínio da aplicação e exercido em um meta-nível, através de técnicas de reflexão computacional. A adoção de reflexão computacional no modelo de orientação a objetos permite organizar as atividades de tolerância a falhas sem interferir no aspecto estrutural dos objetos do domínio da aplicação. Os controles que devem ser exercidos pelos meta-objetos sobre os objetos da aplicação são realizados em um meta-nível, de forma a separar as funcionalidades especificas da aplicação daquelas pertinentes ao domínio de tolerância a falhas. Estes meta-objetos, são organizados na forma de um framework, denominado MOTF - Meta-objetos para Tolerância a Falhas. O projeto de MOTF é um framework que apóia o desenvolvimento de aplicações tolerantes a falhas, compreendendo múltiplas classes que definem as funcionalidades exigidas por diversas técnicas de tolerância a falhas. Adota uma arquitetura reflexiva, na qual o meta-nível é dedicado as atividades de detecção e recuperação de erros através da monitoração de objetos da aplicação, localizados no nível base. Características de tolerância a falhas podem ser adicionadas a objetos considerados críticos pela aplicação, assim distribuindo, e não centralizando, a propriedade de tolerar falhas entre objetos da aplicação. Incorporando os princípios de reflexão computacional ao modelo de orientação a objetos dois benefícios principais se salientam: promover a reutilização de objetos tolerantes a falhas e garantir a invulnerabilidade do objeto do domínio da aplicação, ao separar as ações pertinentes ao domínio da aplicação das específicas do sistema tolerante a falhas. / Software fault-tolerance encompasses all techniques and programming languages' mechanisms intended to support the development of high reliability software. We can consider the faulttolerance area a proper domain of knowledge composed by well-defined techniques used to guarantee the reliability of applications related to other domains. Therefore, the fault-tolerance domain acts over other domains. When the fault-tolerance domain is merged into an application domain it becomes responsible for the non-functional requirements of the application. Among those requirements, reliability and safety are crucial ones and they use methods and concerns not related to the application domain. The object-oriented approach to software development allows a software to be decomposed into a set of components - the objects. Each object has its own structure and behavior. The view of a system as composed by interacting objects can be quite convenient in expressing different degrees of fault tolerance. We can distinguish between critical and non-critical objects and we may even distinguish between critical and non-critical operations within a single object. The objective of this research is the exploitation of object-oriented approach to increase reliability and decrease the effects of failures based on the provision of fault-tolerant objects. Fault-tolerant objects are abstractions of high reliability components and are designed to support several fault-tolerance strategies. Furthermore, computational reflection is adopted to organize faulttolerant activities at a meta-level and to provide transparent interfacing among fault-tolerant and non-fault-tolerant objects. A fault-tolerant object can be defined as an object that represents a single interface to redundant services and whose behavior is controlled by a metaobject. Possible behaviors of fault-tolerant objects include replication or diversity and the associated metaobject adds a specific fault-tolerant behavior to its referent object without interfering in its internal structure. MOTF - Metaobjects for Fault Tolerance is a framework intended to support the development of fault-tolerant applications. This framework consists of reusable meta-level classes. Each meta-level class implements a fault-tolerant service, and metaobjects are used as monitoring agents of fault-tolerant objects. The reflective object-oriented architecture promotes reusability and hides the programming of fault-tolerant mechanisms from the application. Programação Tolerancia : Falhas Orientacao : Objetos Software fault-tolerance Object-oriented Computational reflection
202	Detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip / Detection and protection of susceptible basic blocks through systematic bit-flip analysis Rodrigues, Diego Gonçalves January 2015 (has links) Partículas radioativas, ao atingirem o hardware dos sistemas computacionais, podem resultar em comportamentos inesperados durante a execução de um software. Tais comportamentos inesperados podem persistir por toda a vida útil do sistema ou podem ter uma duração limitada. Nesse último caso, temos o que chamamos de falhas transientes. Falhas transientes podem fazer com que as instruções do programa executem em uma sequência incorreta, o que chamamos de erros de fluxo de controle (Control-flow errors - CFEs). Estudos mostram que entre 33% e 77% das falhas transientes que afetam o hardware se manifestam como erros de fluxo de controle, dependendo do tipo do processador. Se o sistema não realizar nenhuma verificação em tempo de execução, um erro de fluxo de controle pode não ser detectado, o que pode resultar em uma execução incorreta do programa. Sistemas projetados para aplicações de baixo custo voltados para sistemas embarcados, onde os custos e desempenho são os fatores principais, utilizam técnicas baseadas em software para aumentar a confiabilidade do sistema. As técnicas baseadas em software para detecção de CFEs são conhecidas como signature monitoring ou signature checking. Essas técnicas introduzem código extra em todos os blocos básicos do programa com a finalidade de detectar os CFEs. Esse código extra implica em overhead, que pode ter uma grande variação dependendo da técnica utilizada. Na tentativa de minimizar o overhead imposto pelas técnicas de detecção de CFEs, neste trabalho foi desenvolvida a técnica de detecção e proteção de blocos básicos suscetíveis através da análise sistemática de single bit-flip. O objetivo da técnica é detectar os blocos básicos suscetíveis do programa através da análise sistemática de single bit-flip e proteger apenas esses blocos básicos. A técnica foi avaliada em termos de sua taxa de cobertura de falhas e desempenho. Para avaliar a taxa de cobertura falhas foram realizadas várias campanhas de injeção de falhas nos programas da suíte de benchmarks Mibench. A avaliação de desempenho foi feita com base na quantidade de instruções de máquina executadas pelos benchmarks, comparando quantidade de instruções antes e depois da utilização da técnica detecção e proteção de blocos básicos suscetíveis. Os resultados dos experimentos mostram que é possível reduzir em até 27,93% a quantidade de blocos básicos protegidos e ao mesmo tempo manter uma alta taxa de cobertura de falhas. Porém, em termos de desempenho, o ganho não ficou na mesma proporção da quantidade de blocos básicos não protegidos, ficando abaixo do esperado. / Radioactive particles hitting the hardware of computer systems may result in unexpected behavior during software execution. Such unexpected behavior may persist for the lifetime of the system or may have a limited duration. In the latter case, we have what is called a transient fault. Transient faults may cause the program instructions to execute in an incorrect sequence. This incorrect sequence is called a control flow error (CFE). Research show that between 33% to 77% of transient faults manifest themselves as CFEs, depending on the type of the processor. If the system does not perform any verification at runtime, a control flow error may be not detected, which can result in incorrect program execution. Systems projected to low-cost embedded applications, where cost and performance are the main factors, use software based techniques to improve system reliability. Software based techniques to detect CFEs are known as signature monitoring or signature checking. These techniques insert extra code in each basic block of the program in order to detect CFEs. This extra code add an undesirable overhead in the program, which can have large variation depending on the technique used. In the attempt to minimize the overhead added by CFEs detection techniques, this work developed a technique of detection and protection of susceptible basic blocks through systematic bit-flip analysis. The purpose of this technique is to detect the susceptible basic blocks of the program through the systematic bit-flip analysis and to protect only these basic blocks. The technique was evaluated based on its fault coverage rate and performance. To evaluate the fault coverage rate a fault injection campaing was performed in the programs of the Mibench benchmark suite. The performance evaluation was based in the number of instruction executed by each benchmark, comparing the number of instructions before and after the use of the proposed technique. The experimental results show that is possible to reduce up to 27,93% the amount of protected basic blocks, while keeping a high faults coverage hate. However, in terms of performance, the gain was not in the same proportion, being lower than expected. Sistemas embarcados Tolerancia : Falhas Embedded systems Transient faults Fault tolerance Signature checking
203	Cost-effective dynamic repair for FPGAs in real-time systems / Reparo dinâmico de baixo custo para FPGAs em sistemas tempo-real Santos, Leonardo Pereira January 2016 (has links) Field-Programmable Gate Arrays (FPGAs) são largamente utilizadas em sistemas digitais por características como flexibilidade, baixo custo e alta densidade. Estas características advém do uso de células de SRAM na memória de configuração, o que torna estes dispositivos suscetíveis a erros induzidos por radiação, tais como SEUs. TMR é o método de mitigação mais utilizado, no entanto, possui um elevado custo tanto em área como em energia, restringindo seu uso em aplicações de baixo custo e/ou baixo consumo. Como alternativa a TMR, propõe-se utilizar DMR associado a um mecanismo de reparo da memória de configuração da FPGA chamado scrubbing. O reparo de FPGAs em sistemas em tempo real apresenta desafios específicos. Além da garantia da computação correta dos dados, esta computação deve se dar completamente dentro do tempo disponível (time-slot), devendo ser finalizada antes do tempo limite (deadline). A diferença entre o tempo de computação dos dados e a deadline é chamado de slack e é o tempo disponível para reparo do sistema. Este trabalho faz uso de scrubbing deslocado dinâmico, que busca maximizar a probabilidade de reparo da memória de configuração de FPGAs dentro do slack disponível, baseado em um diagnóstico do erro. O scrubbing deslocado já foi utilizado com técnicas de diagnóstico de grão fino (NAZAR, 2015). Este trabalho propõe o uso de técnicas de diagnóstico de grão grosso para o scrubbing deslocado, evitando as penalidades de desempenho e custos em área associados a técnicas de grão fino. Circuitos do conjunto MCNC foram protegidos com as técnicas propostas e submetidos a seções de injeção de erros (NAZAR; CARRO, 2012a). Os dados obtidos foram analisados e foram calculadas as melhores posição iniciais do scrubbing para cada um dos circuitos. Calculou-se a taxa de Failure-in-Time (FIT) para comparação entre as diferentes técnicas de diagnóstico propostas. Os resultados obtidos confirmaram a hipótese inicial deste trabalho que a redução do número de bits sensíveis e uma baixa degradação do período do ciclo de relógio permitiram reduzir a taxa de FIT quando comparadas com técnicas de grão fino. Por fim, uma comparação entre as três técnicas propostas é feita, analisando o desempenho e custos em área associados a cada uma. / Field-Programmable Gate Arrays (FPGAs) are widely used in digital systems due to characteristics such as flexibility, low cost and high density. These characteristics are due to the use of SRAM memory cells in the configuration memory, which make these devices susceptible to radiation-induced errors, such as SEUs. TMR is the most used mitigation technique, but it has an elevated cost both in area as well as in energy, restricting its use in low cost/low energy applications. As an alternative to TMR, we propose the use of DMR associated with a repair mechanism of the FPGA configuration memory called scrubbing. The repair of FPGA in real-time systems present a specific set of challenges. Besides guaranteeing the correct computation of data, this computation must be completely carried out within the available time (time-slot), being finalized before a time limit (deadline). The difference between the computation time and the deadline is called the slack and is the time available to repair the system. This work uses a dynamic shifted scrubbing that aims to maximize the repair probability of the configuration memory of the FPGA within the available slack based on error diagnostic. The shifted scrubbing was already proposed with fine-grained diagnostic techniques (NAZAR, 2015). This work proposes the use of coarse-grained diagnostic technique as a way to avoid the performance penalties and area costs associated to fine-grained techniques. Circuits of the MCNC suite were protected by the proposed techniques and subject to error-injection campaigns (NAZAR; CARRO, 2012a). The obtained data was analyzed and the best scrubbing starting positions for each circuit were calculated. The Failure-in-Time (FIT) rates were calculated to compare the different proposed diagnostic techniques. The obtained results validated the initial hypothesis of this work that the reduction of the number of sensitive bits and a low degradation of the clock cycle allowed a reduced FIT rate when compared with fine-grained diagnostic techniques. Finally, a comparison is made between the proposed techniques, considering performance and area costs associated to each one. Microeletrônica Sistemas : Tempo real Field-programmable gate arrays (FPGA) Scrubbing Fault diagnosis Fault tolerance Real-time
204	Sistema embarcado para a manutenção inteligente de atuadores elétricos / Embedded systems for intelligent maintenance of electrical actuators Bosa, Jefferson Luiz January 2009 (has links) O elevado custo de manutenção nos ambientes industriais motivou pesquisas de novas técnicas para melhorar as ações de reparos. Com a evolução tecnológica, principalmente da eletrônica, que proporcionou o uso de sistemas embarcados para melhorar as atividades de manutenção, estas agregaram inteligência e evoluíram para uma manutenção pró-ativa. Através de ferramentas de processamento de sinais, inteligência artificial e tolerância a falhas, surgiram novas abordagens para os sistemas de monitoramento a serviço da equipe de manutenção. Os ditos sistemas de manutenção inteligente, cuja tarefa é realizar testes em funcionamento (on-line) nos equipamentos industriais, promovem novos modelos de confiabilidade e disponibilidade. Tais sistemas são baseados nos conceitos de tolerância a falhas, e visam detectar, diagnosticar e predizer a ocorrência de falhas. Deste modo, fornece-se aos engenheiros de manutenção a informação antecipada do estado de comportamento do equipamento antes mesmo deste manifestar uma falha, reduzindo custos, aumentando a vida útil e tornando previsível o reparo. Para o desenvolvimento do sistema de manutenção inteligente objeto deste trabalho, foram estudadas técnicas de inteligência artificial (redes neurais artificiais), técnicas de projeto de sistemas embarcados e de prototipação em plataformas de hardware. No presente trabalho, a rede neural Mapas Auto-Organizáveis foi adotada como ferramenta base para detecção e diagnóstico de falhas. Esta foi prototipada numa plataforma de sistema embarcado baseada na tecnologia FPGA (Field Programmable Gate Array). Como estudo de caso, uma válvula elétrica utilizada em dutos para transporte de petróleo foi definida como aplicação alvo dos experimentos. Através de um modelo matemático, um conjunto de dados representativo do comportamento da válvula foi simulado e utilizado como entrada do sistema proposto. Estes dados visam o treinamento da rede neural e visam fornecer casos de teste para experimentação no sistema. Os experimentos executados em software validaram o uso da rede neural como técnica para detecção e diagnóstico de falhas em válvulas elétricas. Por fim, também realizou-se experimentos a fim de validar o projeto do sistema embarcado, comparando-se os resultado obtidos com este aos resultados obtidos a partir de testes em software. Os resultados revelam a escolha correta do uso da rede neural e o correto projeto do sistema embarcado para desempenhar as tarefas de detecção e diagnóstico de falhas em válvulas elétricas. / The high costs of maintenance in industrial environments have motivated research for new techniques to improve repair activities. The technological progress, especially in the electronics field, has provided for the use of embedded systems to improve repair, by adding intelligence to the system and turning the maintenance a proactive activity. Through tools like signal processing, artificial intelligence and fault-tolerance, new approaches to monitoring systems have emerged to serve the maintenance staff, leading to new models of reliability and availability. The main goal of these systems, also called intelligent maintenance systems, is to perform in-operation (on-line) test of industrial equipments. These systems are built based on fault-tolerance concepts, and used for the detection, the diagnosis and the prognosis of faults. They provide the maintenance engineers with information on the equipment behavior, prior to the occurrence of failures, reducing maintenance costs, increasing the system lifetime and making it possible to schedule repairing stops. To develop the intelligent maintenance system addressed in this dissertation, artificial intelligence (neural networks), embedded systems design and hardware prototyping techniques were studied. In this work, the neural network Self-Organizing Maps (SOM) was defined as the basic tool for the detection and the diagnosis of faults. The SOM was prototyped in an embedded system platform based on the FPGA technology (Field Programmable Gate Array). As a case study, the experiments were performed on an electric valve used in a pipe network for oil transportation. Through a mathematical model, a data set representative of the valve behavior was obtained and used as input to the proposed maintenance system. These data were used for neural network training and also provided test cases for system monitoring. The experiments were performed in software to validate the chosen neural network as the technique for the detection and diagnosis of faults in the electrical valve. Finally, experiments to validate the embedded system design were also performed, so as to compare the obtained results to those resulting from the software tests. The results show the correct choice of the neural network and the correct embedded systems design to perform the activities for the detection and diagnosis of faults in the electrical valve. Microeletrônica Sistemas embarcados Tolerancia : Falhas On-line test Fault-tolerance Self-organizing maps Embedded systems FPGA Maintenance
205	Analise dos efeitos de falhas transientes no conjunto de banco de registradores em unidades gráficas de processamento / Evaluation of transient fault effect in the register files of graphics processing units Nedel, Werner Mauricio January 2015 (has links) Unidades gráficas de processamento, mais conhecidas como GPUs (Graphics Processing Unit), são dispositivos que possuem um grande poder de processamento paralelo com respectivo baixo custo de operação. Sua capacidade de simultaneamente manipular grandes blocos de memória a credencia a ser utilizada nas mais variadas aplicações, tais como processamento de imagens, controle de tráfego aéreo, pesquisas acadêmicas, dentre outras. O termo GPGPUs (General Purpose Graphic Processing Unit) designa o uso de GPUs utilizadas na computação de aplicações de uso geral. A rápida proliferação das GPUs com ao advento de um modelo de programação amigável ao usuário fez programadores utilizarem essa tecnologia em aplicações onde confiabilidade é um requisito crítico, como aplicações espaciais, automotivas e médicas. O crescente uso de GPUs nestas aplicações faz com que novas arquiteturas deste dispositivo sejam propostas a fim de explorar seu alto poder computacional. A arquitetura FlexGrip (FLEXible GRaphIcs Processor) é um exemplo de GPGPU implementada em FPGA (Field Programmable Gate Array), sendo compatível com programas implementados especificamente para GPUs, com a vantagem de possibilitar a customização da arquitetura de acordo com a necessidade do usuário. O constante aumento da demanda por tecnologia fez com que GPUs de última geração sejam fabricadas em tecnologias com processo de fabricação de até 28nm, com frequência de relógio de até 1GHz. Esse aumento da frequência de relógio e densidade de transistores, combinados com a redução da tensão de operação, faz com que os transistores fiquem mais suscetíveis a falhas causadas por interferência de radiação. O modelo de programação utilizado pelas GPUs faz uso de constantes acessos a memórias e registradores, tornando estes dispositivos sensíveis a perturbações transientes em seus valores armazenados. Estas perturbações são denominadas Single Event Upset (SEU), ou bit-flip, e podem resultar em erros no resultado final da aplicação. Este trabalho tem por objetivo apresentar um modelo de injeção de falhas transientes do tipo SEU nos principais bancos de registradores da GPGPU Flexgrip, avaliando o comportamento da execução de diferentes algoritmos em presença de SEUs. O impacto de diferentes distribuições de recursos computacionais da GPU em sua confiabilidade também é abordado. Resultados podem indicar maneiras eficientes de obter-se confiabilidade explorando diferentes configurações de GPUs. / Graphic Process Units (GPUs) are specialized massively parallel units that are widely used due to their high computing processing capability with respective lower costs. The ability to rapidly manipulate high amounts of memory simultaneously makes them suitable for solving computer-intensive problems, such as analysis of air traffic control, academic researches, image processing and others. General-Purpose Graphic Processing Units (GPGPUs) designates the use of GPUs in applications commonly handled by Central Processing Units (CPUs). The rapid proliferation of GPUs due to the advent of significant programming support has brought programmers to use such devices in safety critical applications, like automotive, space and medical. This crescent use of GPUs pushed developers to explore its parallel architecture and proposing new implementations of such devices. The FLEXible GRaphics Processor (FlexGrip) is an example of GPGPU optimized for Field Programmable Arrays (FPGAs) implementation, fully compatible with GPU’s compiled programs. The increasing demand for computational has pushed GPUs to be built in cuttingedge technology down to 28nm fabrication process for the latest NVIDIA devices with operating clock frequencies up to 1GHz. The increases in operating frequencies and transistor density combined with the reduction of voltage supplies have made transistors more susceptible to faults caused by radiation. The program model adopted by GPUs makes constant accesses to its memories and registers, making this device sensible to transient perturbations in its stored values. These perturbations are called Single Event Upset (SEU), or just bit-flip, and might cause the system to experience an error. The main goal of this work is to study the behavior of the GPGPU FlexGrip under the presence of SEUs in a range of applications. The distribution of computational resources of the GPUs and its impact in the GPU confiability is also explored, as well as the characterization of the errors observed in the fault injection campaigns. Results can indicate efficient configurations of GPUs in order to avoid perturbations in the system under the presence of SEUs. Microeletrônica Processamento : Imagem Simulação computacional GPU Parallel processing High performance Fault tolerance
206	[en] A FAULT TOLERANT MECHANISM FOR WORKFLOW MANAGEMENT SYSTEMS / [pt] UM MECANISMO DE TOLERÂNCIA A FALHAS PARA SISTEMA DE GERENCIAMENTO DE WORKFLOW BERNARDO QUARESMA DIAS 01 February 2011 (has links) [pt] Nesse trabalho propomos um mecanismo com detecção de falhas, replicação e gerenciamento de grupos para instrumentação de sistemas de gerenciamento de workflow com tolerância a falhas. Sistemas de gerenciamento de workfow demandam alguns recursos específicos de replicação, pois realizam operações não-determinísticas e não dependem de chamadas externas para atualização do seu estado. Como estudo de caso, utilizamos um sistema de automação de procedimentos industriais e analisamos as modificações necessárias para utilização desse sistema com o mecanismo de tolerância a falhas proposto. Também avaliamos o impacto no desempenho do sistema decorrente do uso do mecanismo proposto. / [en] In this work we propose a mechanism for failure detection, group management and service replication, providing fault tolerance for workflow management systems. Workflow management systems require specific replication features, since such systems deal with non-deterministic operations and update their s internal state without any external calls. As a case study we use an industrial automation system and analyze the needed modifications to use the proposed mechanism and evaluate the impact of the mechanism in the system s performance. [pt] REPLICACAO [en] REPLICATION [pt] WORKFLOW [en] WORKFLOW [pt] TOLERANCIA A FALHAS [en] FAULT TOLERANCE
207	Combinaison des aspects temps réel et sûreté de fonctionnement pour la conception des plateformes avioniques / Combination of real-time and safety aspects for the design of avionic platforms Many, Florian 18 February 2013 (has links) La conception des plateformes aéronautiques s’effectue en tenant compte des aspects fonctionnels et dysfonctionnels prévus dans les scénarios d’emploi des aéronefs qui les embarquent. Ces plateformes aéronautiques sont composées de systèmes informatiques temps réel qui doivent à la fois être précises dans leurs calculs, exactes dans l’instant de délivrance des résultats des calculs, et robustes à tout évènement pouvant compromettre le bon fonctionnement de la plateforme.Dans ce contexte, ces travaux de thèse abordent les ordonnancements temps réel tolérants aux fautes. Partant du fait que les systèmes informatiques embarqués sont perturbés par les ondes électromagnétiques des radars, notamment dans la phase d’approche des aéronefs, ces travaux proposent une modélisation des effets des ondes, dite en rafales de fautes. Après avoir exploré le comportement de l’ordonnanceur à la détection d’erreurs au sein d’une tâche, une technique de validation, reposant sur le calcul de pire temps de réponse des tâches, est présentée. Il devient alors possible d’effectuer des analyses d’ordonnançabilité sous l’hypothèse de la présence de rafales de fautes. Ainsi, cette technique de validation permet de conclure sur la faisabilité d’un ensemble de tâches en tenant compte de la durée de la rafale de fautes et de la stratégie de gestion des erreurs détectées dans les tâches.Sur la base de ces résultats, les travaux décrits montre comment envisager l’analyse au niveau système. L’idée sous-jacente est de mettre en évidence le rôle des ordonnancements temps réel tolérants aux fautes dans la gestion des données erronées causées par des perturbations extérieures au système.Ainsi, le comportement de chaque équipement est modélisé, ainsi que les flots de données échangés et la dynamique du système. Le comportement de chaque équipement est fonction de la perturbation subie, et donne lieu à l’établissement de la perturbation résultante, véritable réponse dysfonctionnelle de l’équipement à une agression extérieure. / The design of avionic platforms takes into account the functional and dysfunctional aspects, which depend on the aircraft operation concept. These avionic platforms embed computer resources that must produce accurate results at the right time, and must be dependable whatever the disturbance.In this specific context, we address the topic of fault tolerant real time scheduling. Since the embedded computer resources are disturbed by electromagnetic waves produced by radar, especially during the aircraft approach, we suggest a model of these wave effects named fault bursts. Afterthe analysis of scheduler behaviour when an error is dectected inside a task, we present a validation technique based on the evaluation of the worst case response time. By this way, we are able to study the task set feasibility under fault burst assumption and according to the error recovery strategy.Then, based on these results, we show a way to analyse the effects of disturbances such as electromagnetic waves at system level. The underlining idea is to demonstrate the main role of fault tolerant real time scheduler in the management of erroneous data. To do that, we suggest an equipment model which integrates the behaviour of the equipement when a disturbance occurs. We also describe thedata flows in order to describe the avionics platform dynamics. Ordonnancement temps réel Tolérance aux fautes Rafales de faute Real-Time Scheduling Fault tolerance Fault burst 629.135
208	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM
209	Reliability evaluation and error mitigation in pedestrian detection algorithms for embedded GPUs / Validação da confiabilidade e tolerância a falhas em algoritmos de detecção de pedestres para GPUs embarcadas Santos, Fernando Fernandes dos January 2017 (has links) A confiabilidade de algoritmos para detecção de pedestres é um problema fundamental para carros auto dirigíveis ou com auxílio de direção. Métodos que utilizam algoritmos de detecção de objetos como Histograma de Gradientes Orientados (HOG - Histogram of Oriented Gradients) ou Redes Neurais de Convolução (CNN – Convolutional Neural Network) são muito populares em aplicações automotivas. Unidades de Processamento Gráfico (GPU – Graphics Processing Unit) são exploradas para executar detecção de objetos de uma maneira eficiente. Infelizmente, as arquiteturas das atuais GPUs tem se mostrado particularmente vulneráveis a erros induzidos por radiação. Este trabalho apresenta uma validação e um estudo analítico sobre a confiabilidade de duas classes de algoritmos de detecção de objetos, HOG e CNN. Esta pesquisa almeja não somente quantificar, mas também qualificar os erros produzidos por radiação em aplicações de detecção de objetos em GPUs embarcadas. Os resultados experimentais com HOG foram obtidos usando duas arquiteturas de GPU embarcadas diferentes (Tegra e AMD APU), cada uma foi exposta por aproximadamente 100 horas em um feixe de nêutrons em Los Alamos National Lab (LANL). As métricas Precision e Recall foram usadas para validar a criticalidade do erro. Uma análise final mostrou que por um lado HOG é intrinsecamente resiliente a falhas (65% a 85% dos erros na saída tiveram um pequeno impacto na detecção), do outro lado alguns erros críticos aconteceram, tais que poderiam resultar em pedestres não detectados ou paradas desnecessárias do veículo. Este trabalho também avaliou a confiabilidade de duas Redes Neurais de Convolução para detecção de Objetos:Darknet e Faster RCNN. Três arquiteturas diferentes de GPUs foram expostas em um feixe de nêutrons controlado (Kepler, Maxwell, e Pascal), com as redes detectando objetos em dois data sets, Caltech e Visual Object Classes. Através da análise das saídas corrompidas das redes neurais, foi possível distinguir entre erros toleráveis e erros críticos, ou seja, erros que poderiam impactar na detecção de objetos. Adicionalmente, extensivas injeções de falhas no nível da aplicação (GDB) e em nível arquitetural (SASSIFI) foram feitas, para identificar partes críticas do código para o HOG e as CNNs. Os resultados mostraram que não são todos os estágios da detecção de objetos que são críticos para a confiabilidade da detecção final. Graças a injeção de falhas foi possível identificar partes do HOG e da Darknet, que se protegidas, irão com uma maior probabilidade aumentar a sua confiabilidade, sem adicionar um overhead desnecessário. A estratégia de tolerância a falhas proposta para o HOG foi capaz de detectar até 70% dos erros com 12% de overhead de tempo. / Pedestrian detection reliability is a fundamental problem for autonomous or aided driving. Methods that use object detection algorithms such as Histogram of Oriented Gradients (HOG) or Convolutional Neural Networks (CNN) are today very popular in automotive applications. Embedded Graphics Processing Units (GPUs) are exploited to make object detection in a very efficient manner. Unfortunately, GPUs architecture has been shown to be particularly vulnerable to radiation-induced failures. This work presents an experimental evaluation and analytical study of the reliability of two types of object detection algorithms: HOG and CNNs. This research aim is not just to quantify but also to qualify the radiation-induced errors on object detection applications executed in embedded GPUs. HOG experimental results were obtained using two different architectures of embedded GPUs (Tegra and AMD APU), each exposed for about 100 hours to a controlled neutron beam at Los Alamos National Lab (LANL). Precision and Recall metrics are considered to evaluate the error criticality. The reported analysis shows that, while being intrinsically resilient (65% to 85% of output errors only slightly impact detection), HOG experienced some particularly critical errors that could result in undetected pedestrians or unnecessary vehicle stops. This works also evaluates the reliability of two Convolutional Neural Networks for object detection: You Only Look Once (YOLO) and Faster RCNN. Three different GPU architectures were exposed to controlled neutron beams (Kepler, Maxwell, and Pascal) detecting objects in both Caltech and Visual Object Classes data sets. By analyzing the neural network corrupted output, it is possible to distinguish between tolerable errors and critical errors, i.e., errors that could impact detection. Additionally, extensive GDB-level and architectural-level fault-injection campaigns were performed to identify HOG and YOLO critical procedures. Results show that not all stages of object detection algorithms are critical to the final classification reliability. Thanks to the fault injection analysis it is possible to identify HOG and Darknet portions that, if hardened, are more likely to increase reliability without introducing unnecessary overhead. The proposed HOG hardening strategy is able to detect up to 70% of errors with a 12% execution time overhead. Sistemas embarcados Tolerancia : Falhas : Software Pedestre Fault tolerance Soft error Pedestrian detection Hardening
210	Investigating techniques to reduce soft error rate under single-event-induced charge sharing / Investigando técnicas para reduzir a taxa de erro de soft sob evento único induzido de carga compartilhada Almeida, Antonio Felipe Costa de January 2014 (has links) The interaction of radiation with integrated circuits can provoke transient faults due to the deposit of charge in sensitive nodes of transistors. Because of the decrease the size in the process technology, charge sharing between transistors placed close to each other has been more and more observed. This phenomenon can lead to multiple transient faults. Therefore, it is important to analyze the effect of multiple transient faults in integrated circuits and investigate mitigation techniques able to cope with multiple faults. This work investigates the effect known as single-event-induced charge sharing in integrated circuits. Two main techniques are analyzed to cope with this effect. First, a placement constraint methodology is proposed. This technique uses placement constraints in standard cell based circuits. The objective is to achieve a layout for which the Soft-Error Rate (SER) due charge shared at adjacent cell is reduced. A set of fault injection was performed and the results show that the SER can be minimized due to single-event-induced charge sharing in according to the layout structure. Results show that by using placement constraint, it is possible to reduce the error rate from 12.85% to 10.63% due double faults. Second, Triple Modular Redundancy (TMR) schemes with different levels of granularities limited by majority voters are analyzed under multiple faults. The TMR versions are implemented using a standard design flow based on a traditional commercial standard cell library. An extensive fault injection campaign is then performed in order to verify the softerror rate due to single-event-induced charge sharing in multiple nodes. Results show that the proposed methodology becomes crucial to find the best trade-off in area, performance and soft-error rate when TMR designs are considered under multiple upsets. Results have been evaluated in a case-study circuit Advanced Encryption Standard (AES), synthesized to 90nm Application Specific Integrated Circuit (ASIC) library, and they show that combining the two techniques, the error rate resulted from multiple faults can be minimized or masked. By using TMR with different granularities and placement constraint methodology, it is possible to reduce the error rate from 11.06% to 0.00% for double faults. A detailed study of triple, four and five multiple faults combining both techniques are also described. We also tested the TMR with different granularities in SRAM-based FPGA platform. Results show that the versions with a fine grain scheme (FGTMR) were more effectiveness in masking multiple faults, similarly to results observed in the ASICs. In summary, the main contribution of this master thesis is the investigation of charge sharing effects in ASICs and the use of a combination of techniques based on TMR redundancy and placement to improve the tolerance under multiple faults. Microeletrônica Tolerancia : Falhas Fault tolerance Placement Constraining Single-Event-Induced Charge Sharing Triple Modular Redundancy

Search results