Global ETD Search

131	Multicast-Based Interactive-Group Object-Replication For Fault Tolerance Soria-Rodriguez, Pedro 25 October 1999 (has links) "Distributed systems are clusters of computers working together on one task. The sharing of information across different architectures, and the timely and efficient use of the network resources for communication among computers are some of the problems involved in the implementation of a distributed system. In the case of a low latency system, the network utilization and the responsiveness of the communication mechanism are even more critical. This thesis introduces a new approach for the distribution of messages to computers in the system, in which, the Common Object Request Broker Architecture (CORBA) is used in conjunction with IP multicast to implement a fault-tolerant, low latency distributed system. Fault tolerance is achieved by replication of the current state of the system across several hosts. An update of the current state is initiated by a client application that contacts one of the state object replicas. The new information needs to be distributed to all the members of the distributed system (the object replicas). This state update is accomplished by using a two-phase commit protocol, which is implemented using a binary tree structure along with IP multicast to reduce the amount of network utilization, distribute the computation load associated with state propagation, and to achieve faster communication among the members of the distributed system. The use of IP multicast enhances the speed of message distribution, while the two-phase commit protocol encapsulates IP multicast to produce a reliable multicast service that is suitable for fault tolerant, distributed low latency applications. The binary tree structure, finally, is essential for the load sharing of the state commit response collection processing. " multicast CORBA commit protocol fault tolerance Electronic data processing Distributed processing Fault-tolerant computing
132	Analise dos efeitos de falhas transientes no conjunto de banco de registradores em unidades gráficas de processamento / Evaluation of transient fault effect in the register files of graphics processing units Nedel, Werner Mauricio January 2015 (has links) Unidades gráficas de processamento, mais conhecidas como GPUs (Graphics Processing Unit), são dispositivos que possuem um grande poder de processamento paralelo com respectivo baixo custo de operação. Sua capacidade de simultaneamente manipular grandes blocos de memória a credencia a ser utilizada nas mais variadas aplicações, tais como processamento de imagens, controle de tráfego aéreo, pesquisas acadêmicas, dentre outras. O termo GPGPUs (General Purpose Graphic Processing Unit) designa o uso de GPUs utilizadas na computação de aplicações de uso geral. A rápida proliferação das GPUs com ao advento de um modelo de programação amigável ao usuário fez programadores utilizarem essa tecnologia em aplicações onde confiabilidade é um requisito crítico, como aplicações espaciais, automotivas e médicas. O crescente uso de GPUs nestas aplicações faz com que novas arquiteturas deste dispositivo sejam propostas a fim de explorar seu alto poder computacional. A arquitetura FlexGrip (FLEXible GRaphIcs Processor) é um exemplo de GPGPU implementada em FPGA (Field Programmable Gate Array), sendo compatível com programas implementados especificamente para GPUs, com a vantagem de possibilitar a customização da arquitetura de acordo com a necessidade do usuário. O constante aumento da demanda por tecnologia fez com que GPUs de última geração sejam fabricadas em tecnologias com processo de fabricação de até 28nm, com frequência de relógio de até 1GHz. Esse aumento da frequência de relógio e densidade de transistores, combinados com a redução da tensão de operação, faz com que os transistores fiquem mais suscetíveis a falhas causadas por interferência de radiação. O modelo de programação utilizado pelas GPUs faz uso de constantes acessos a memórias e registradores, tornando estes dispositivos sensíveis a perturbações transientes em seus valores armazenados. Estas perturbações são denominadas Single Event Upset (SEU), ou bit-flip, e podem resultar em erros no resultado final da aplicação. Este trabalho tem por objetivo apresentar um modelo de injeção de falhas transientes do tipo SEU nos principais bancos de registradores da GPGPU Flexgrip, avaliando o comportamento da execução de diferentes algoritmos em presença de SEUs. O impacto de diferentes distribuições de recursos computacionais da GPU em sua confiabilidade também é abordado. Resultados podem indicar maneiras eficientes de obter-se confiabilidade explorando diferentes configurações de GPUs. / Graphic Process Units (GPUs) are specialized massively parallel units that are widely used due to their high computing processing capability with respective lower costs. The ability to rapidly manipulate high amounts of memory simultaneously makes them suitable for solving computer-intensive problems, such as analysis of air traffic control, academic researches, image processing and others. General-Purpose Graphic Processing Units (GPGPUs) designates the use of GPUs in applications commonly handled by Central Processing Units (CPUs). The rapid proliferation of GPUs due to the advent of significant programming support has brought programmers to use such devices in safety critical applications, like automotive, space and medical. This crescent use of GPUs pushed developers to explore its parallel architecture and proposing new implementations of such devices. The FLEXible GRaphics Processor (FlexGrip) is an example of GPGPU optimized for Field Programmable Arrays (FPGAs) implementation, fully compatible with GPU’s compiled programs. The increasing demand for computational has pushed GPUs to be built in cuttingedge technology down to 28nm fabrication process for the latest NVIDIA devices with operating clock frequencies up to 1GHz. The increases in operating frequencies and transistor density combined with the reduction of voltage supplies have made transistors more susceptible to faults caused by radiation. The program model adopted by GPUs makes constant accesses to its memories and registers, making this device sensible to transient perturbations in its stored values. These perturbations are called Single Event Upset (SEU), or just bit-flip, and might cause the system to experience an error. The main goal of this work is to study the behavior of the GPGPU FlexGrip under the presence of SEUs in a range of applications. The distribution of computational resources of the GPUs and its impact in the GPU confiability is also explored, as well as the characterization of the errors observed in the fault injection campaigns. Results can indicate efficient configurations of GPUs in order to avoid perturbations in the system under the presence of SEUs. Microeletrônica Processamento : Imagem Simulação computacional GPU Parallel processing High performance Fault tolerance
133	Investigating techniques to reduce soft error rate under single-event-induced charge sharing / Investigando técnicas para reduzir a taxa de erro de soft sob evento único induzido de carga compartilhada Almeida, Antonio Felipe Costa de January 2014 (has links) The interaction of radiation with integrated circuits can provoke transient faults due to the deposit of charge in sensitive nodes of transistors. Because of the decrease the size in the process technology, charge sharing between transistors placed close to each other has been more and more observed. This phenomenon can lead to multiple transient faults. Therefore, it is important to analyze the effect of multiple transient faults in integrated circuits and investigate mitigation techniques able to cope with multiple faults. This work investigates the effect known as single-event-induced charge sharing in integrated circuits. Two main techniques are analyzed to cope with this effect. First, a placement constraint methodology is proposed. This technique uses placement constraints in standard cell based circuits. The objective is to achieve a layout for which the Soft-Error Rate (SER) due charge shared at adjacent cell is reduced. A set of fault injection was performed and the results show that the SER can be minimized due to single-event-induced charge sharing in according to the layout structure. Results show that by using placement constraint, it is possible to reduce the error rate from 12.85% to 10.63% due double faults. Second, Triple Modular Redundancy (TMR) schemes with different levels of granularities limited by majority voters are analyzed under multiple faults. The TMR versions are implemented using a standard design flow based on a traditional commercial standard cell library. An extensive fault injection campaign is then performed in order to verify the softerror rate due to single-event-induced charge sharing in multiple nodes. Results show that the proposed methodology becomes crucial to find the best trade-off in area, performance and soft-error rate when TMR designs are considered under multiple upsets. Results have been evaluated in a case-study circuit Advanced Encryption Standard (AES), synthesized to 90nm Application Specific Integrated Circuit (ASIC) library, and they show that combining the two techniques, the error rate resulted from multiple faults can be minimized or masked. By using TMR with different granularities and placement constraint methodology, it is possible to reduce the error rate from 11.06% to 0.00% for double faults. A detailed study of triple, four and five multiple faults combining both techniques are also described. We also tested the TMR with different granularities in SRAM-based FPGA platform. Results show that the versions with a fine grain scheme (FGTMR) were more effectiveness in masking multiple faults, similarly to results observed in the ASICs. In summary, the main contribution of this master thesis is the investigation of charge sharing effects in ASICs and the use of a combination of techniques based on TMR redundancy and placement to improve the tolerance under multiple faults. Microeletrônica Tolerancia : Falhas Fault tolerance Placement Constraining Single-Event-Induced Charge Sharing Triple Modular Redundancy
134	Applying dual core lockstep in embedded processors to mitigate radiation induced soft errors / Aplicando dual core lockstep em processadores embarcados para mitigar falhas transientes induzidas por radiação Oliveira, Ádria Barros de January 2017 (has links) Os processadores embarcados operando em sistemas de segurança ou de missão crítica não podem falhar. Qualquer falha neste tipo de aplicação pode levar a consequências inaceitáveis, como risco de vida ou danos à propriedade ou ao meio ambiente. Os sistemas embarcados que operam em aplicações aeroespaciais são sucetíveis à falhas transientes induzidas por radiação. Entretanto, os efeitos de radiação também podem ser observados ao nível do solo. Falhas transientes afetam os processadores modificando os valores armazenados em elementos de memória, tais como registradores e memória de dados. Essas falhas podem levar o processador a executar incorretamente a aplicação, provocando erros na saída ou travamentos no sistema. Os avanços recentes em processadores embarcados concistem na integração de processadores hard-core e FPGAs. Tais dispositivos, comumente chamados de Sistemas-em-Chip Totalmente Programáveis (APSoCs), também são sucetíveis aos efeitos de radiação. Com objetivo de minimizar esse problema de tolerância a falhas, este trabalho apresenta um Dual-Core LockStep (DCLS) como uma técnica de tolerância para mitigar falhas induzidas por radiação que afetam processadores embarcados em APSoCs. Lockstep é um método baseado em redundância usado para detectar e corrigir falhas transientes. O DCLS proposto é implementado em um processador ARM Cortex-A9 hard-core embarcado no APSoC Zynq-7000. A eficiência da abordagem implementada foi validada tanto em aplicações executando em bare-metal como no sistema operacional FreeRTOS. Experimentos com íons pesados e emulação de falhas por injeção foram executados para analisar a sucetibilidade do sistema a inversão de bits. Os resultados obtidos mostram que a abordagem é capaz de diminuir a seção de choque do sistema com uma alta taxa de proteção. O sistema DCLS mitigou com sucesso até 78% das falhas injetadas. Otimizações de software também foram avaliadas para uma melhor compreenção dos trade-offs entre desempenho e confiabilidade. Através da análise de diferentes partições de software, observou-se que o tempo de execução de um bloco da aplicação deve ser muito maior que o tempo de verificação para que se obtenha menor impacto em desempenho. A avaliação de otimizações de compilador demonstrou que utilizar o nível O3 aumenta a vulnerabilidade da aplicação à falhas transientes. Como o O3 requer o uso de mais registradores que os otros níveis de otimização, o sistema se torna mais sucetível à falhas. Por outro lado, os resultados dos experimentos de radiação apontam que a aplicação compilada com nível O3 obtém maior Carga de Trabalho Média Entre Falhas (MWBF). Como a aplicação executa mais rápido, mais dados são computados corretamente antes da ocorrência de um erro. / The embedded processors operating in safety- or mission-critical systems are not allowed to fail. Any failure in such applications could lead to unacceptable consequences as life risk or significant damage to property or environment. Concerning faults originated by the radiation-induced soft errors, the embedded systems operating in aerospace applications are particularly susceptible. However, the radiation effects can also be observed at ground level. Soft errors affect processors by modifying values stored in memory elements, such as registers and data memory. These faults may lead the processor to execute an application incorrectly, generating output errors or leading hangs and crashes in the system. The recent advances in embedded systems concern the integration of hard-core processors and FPGAs. Such devices, called All Programmable System-on-Chip (APSoC), are also susceptible to radiation effects. Aiming to address this fault tolerance problem this work presents a Dual-Core LockStep (DCLS) as a fault tolerance technique to mitigate radiation-induced faults affecting processors embedded into APSoCs. Lockstep is a method based on redundancy used to detect and correct soft errors. The proposed DCLS is implemented in a hard-core ARM Cortex-A9 embedded into a Zynq-7000 APSoC. The approach efficiency was validated not only on applications running in baremetal but also on top of FreeRTOS systems. Heavy ions experiments and fault injection emulation were performed to analyze the system susceptibility to bit-flips. The obtained results show that the approach is able to decrease the system cross section with a high rate of protection. The DCLS system successfully mitigated up to 78% of the injected faults. Software optimizations were also evaluated to understand the trade-offs between performance and reliability better. By the analysis of different software partitions, it was observed that the execution time of an application block must to be much longer than the verification time to achieve fewer performance penalties. The compiler optimizations assessment demonstrate that using O3 level increases the application vulnerability to soft errors. Because O3 handles more registers than other optimizations, the system is more susceptible to faults. On the other hand, results from radiation experiments show that O3 level provides a higher Mean Workload Between Failures (MWBF). As the application runs faster, more data are correctly computed before an error occurrence. Microeletrônica Sistemas embarcados Embedded Processors Reliability Fault Injection Radiation Experiments Soft Errors Lockstep Fault Tolerance
135	Adaptive and polymorphic VLIW processor to dynamically balance performance, energy consumption, and fault tolerance / Processador VLIW adaptativo e polimórfico para equilibrar de forma dinâmica o desempenho, o consumo de energia e a tolerância a falhas Sartor, Anderson Luiz January 2018 (has links) Ao se projetar um novo processador, o desempenho não é mais o único objetivo de otimização. Reduzir o consumo de energia também é essencial, pois, enquanto a maior parte dos dispositivos embarcados depende fortemente de bateria, os processadores de propósito geral (GPPs) são restringidos pelos limites da energia térmica de projeto (TDP – thermal design power). Além disso, devido à evolução da tecnologia, a taxa de falhas transientes tem aumentado nos processadores modernos, o que afeta a confiabilidade de sistemas tanto no espaço quanto no nível do mar. Adicionalmente, a maioria dos processadores homogêneos e heterogêneos tem um design fixo, o que limita a adaptação em tempo de execução. Nesse cenário, nós propomos dois designs de processadores que são capazes de realizar o trade-off entre esses eixos de acordo com a aplicação alvo e os requisitos do sistema. Ambos designs baseiam-se em um mecanismo de duplicação de instruções com rollback que detecta e corrige falhas, um módulo de power gating para reduzir o consumo de energia das unidades funcionais. O primeiro é chamado de processador adaptativo e usa thresholds, definidos em tempo de projeto, para adaptar a execução da aplicação Adicionalmente, ele controla o ILP da aplicação para criar mais oportunidade de duplicação e de power gating. O segundo design é chamado processador polimórfico e ele avalia (em tempo de execução) a melhor configuração de hardware a ser usada para cada aplicação. Ele também explora o hardware disponível para maximizar o número de aplicações que são executadas em paralelo. Para a versão adaptativa usando uma configuração orientada a otimização de energia, é possível, em média, economizar 37,2% de energia com um overhead de apenas 8,2% em performance, mantendo baixos níveis de defeito, quando comparado a um design tolerante a falhas. Para a versão polimórfica, os resultados mostram que a reconfiguração dinâmica do processador é capaz de adaptar eficientemente o hardware ao comportamento da aplicação, de acordo com os requisitos especificados pelo designer, chegando a 94.88% do resultado de um processador oráculo quando o trade-off entre os três eixos é considerado. Por outro lado, a melhor configuração estática apenas atinge 28.24% do resultado do oráculo. / Performance is no longer the only optimization goal when designing a new processor. Reducing energy consumption is also mandatory: while most of the embedded devices are heavily dependent on battery power, General-Purpose Processors (GPPs) are being pulled back by the limits of Thermal Design Power (TDP). Moreover, due to technology scaling, soft error rate (i.e., transient faults) has been increasing in modern processors, which affects the reliability of both space and ground-level systems. In addition, most traditional homogeneous and heterogeneous processors have a fixed design, which limits its runtime adaptability. Therefore, they are not able to cope with the changing application behavior when one considers the axes of fault tolerance, performance, and energy consumption altogether. In this context, we propose two processor designs that are able to trade-off these three axes according to the application at hand and system requirements. Both designs rely on an instruction duplication with rollback mechanism that can detect and correct errors and a power gating module to reduce the energy consumption of the functional units The former design, called adaptive processor, uses thresholds defined at design time to allow runtime adaptation of the application’s execution and controls the application’s Instruction-Level Parallelism (ILP) to create more slots for duplication or power gating. The latter design (polymorphic processor) takes the former one step further by dynamically reconfiguring the hardware and evaluating different processor configurations for each application, and it also exploits the available pipelanes to maximize the number of applications that are executed concurrently. For the adaptive processor using an energy-oriented configuration, it is possible, on average, to reduce energy consumption by 37.2% with an overhead of only 8.2% in performance, while maintaining low levels of failure rate, when compared to a fault-tolerant design. For the polymorphic processor, results show that the dynamic reconfiguration of the processor is able to efficiently match the hardware to the behavior of the application, according to the requirements of the designer, achieving 94.88% of the result of an oracle processor when the trade-off between the three axes is considered. On the other hand, the best static configuration only achieves 28.24% of the oracle’s result. Tolerância a falhas Processamento paralelo Adaptive processor Fault tolerance Energy consumption Performance VLIW
136	Reliability evaluation and error mitigation in pedestrian detection algorithms for embedded GPUs / Validação da confiabilidade e tolerância a falhas em algoritmos de detecção de pedestres para GPUs embarcadas Santos, Fernando Fernandes dos January 2017 (has links) A confiabilidade de algoritmos para detecção de pedestres é um problema fundamental para carros auto dirigíveis ou com auxílio de direção. Métodos que utilizam algoritmos de detecção de objetos como Histograma de Gradientes Orientados (HOG - Histogram of Oriented Gradients) ou Redes Neurais de Convolução (CNN – Convolutional Neural Network) são muito populares em aplicações automotivas. Unidades de Processamento Gráfico (GPU – Graphics Processing Unit) são exploradas para executar detecção de objetos de uma maneira eficiente. Infelizmente, as arquiteturas das atuais GPUs tem se mostrado particularmente vulneráveis a erros induzidos por radiação. Este trabalho apresenta uma validação e um estudo analítico sobre a confiabilidade de duas classes de algoritmos de detecção de objetos, HOG e CNN. Esta pesquisa almeja não somente quantificar, mas também qualificar os erros produzidos por radiação em aplicações de detecção de objetos em GPUs embarcadas. Os resultados experimentais com HOG foram obtidos usando duas arquiteturas de GPU embarcadas diferentes (Tegra e AMD APU), cada uma foi exposta por aproximadamente 100 horas em um feixe de nêutrons em Los Alamos National Lab (LANL). As métricas Precision e Recall foram usadas para validar a criticalidade do erro. Uma análise final mostrou que por um lado HOG é intrinsecamente resiliente a falhas (65% a 85% dos erros na saída tiveram um pequeno impacto na detecção), do outro lado alguns erros críticos aconteceram, tais que poderiam resultar em pedestres não detectados ou paradas desnecessárias do veículo. Este trabalho também avaliou a confiabilidade de duas Redes Neurais de Convolução para detecção de Objetos:Darknet e Faster RCNN. Três arquiteturas diferentes de GPUs foram expostas em um feixe de nêutrons controlado (Kepler, Maxwell, e Pascal), com as redes detectando objetos em dois data sets, Caltech e Visual Object Classes. Através da análise das saídas corrompidas das redes neurais, foi possível distinguir entre erros toleráveis e erros críticos, ou seja, erros que poderiam impactar na detecção de objetos. Adicionalmente, extensivas injeções de falhas no nível da aplicação (GDB) e em nível arquitetural (SASSIFI) foram feitas, para identificar partes críticas do código para o HOG e as CNNs. Os resultados mostraram que não são todos os estágios da detecção de objetos que são críticos para a confiabilidade da detecção final. Graças a injeção de falhas foi possível identificar partes do HOG e da Darknet, que se protegidas, irão com uma maior probabilidade aumentar a sua confiabilidade, sem adicionar um overhead desnecessário. A estratégia de tolerância a falhas proposta para o HOG foi capaz de detectar até 70% dos erros com 12% de overhead de tempo. / Pedestrian detection reliability is a fundamental problem for autonomous or aided driving. Methods that use object detection algorithms such as Histogram of Oriented Gradients (HOG) or Convolutional Neural Networks (CNN) are today very popular in automotive applications. Embedded Graphics Processing Units (GPUs) are exploited to make object detection in a very efficient manner. Unfortunately, GPUs architecture has been shown to be particularly vulnerable to radiation-induced failures. This work presents an experimental evaluation and analytical study of the reliability of two types of object detection algorithms: HOG and CNNs. This research aim is not just to quantify but also to qualify the radiation-induced errors on object detection applications executed in embedded GPUs. HOG experimental results were obtained using two different architectures of embedded GPUs (Tegra and AMD APU), each exposed for about 100 hours to a controlled neutron beam at Los Alamos National Lab (LANL). Precision and Recall metrics are considered to evaluate the error criticality. The reported analysis shows that, while being intrinsically resilient (65% to 85% of output errors only slightly impact detection), HOG experienced some particularly critical errors that could result in undetected pedestrians or unnecessary vehicle stops. This works also evaluates the reliability of two Convolutional Neural Networks for object detection: You Only Look Once (YOLO) and Faster RCNN. Three different GPU architectures were exposed to controlled neutron beams (Kepler, Maxwell, and Pascal) detecting objects in both Caltech and Visual Object Classes data sets. By analyzing the neural network corrupted output, it is possible to distinguish between tolerable errors and critical errors, i.e., errors that could impact detection. Additionally, extensive GDB-level and architectural-level fault-injection campaigns were performed to identify HOG and YOLO critical procedures. Results show that not all stages of object detection algorithms are critical to the final classification reliability. Thanks to the fault injection analysis it is possible to identify HOG and Darknet portions that, if hardened, are more likely to increase reliability without introducing unnecessary overhead. The proposed HOG hardening strategy is able to detect up to 70% of errors with a 12% execution time overhead. Sistemas embarcados Tolerancia : Falhas : Software Pedestre Fault tolerance Soft error Pedestrian detection Hardening
137	A type-safe apparatus executing higher order functions in conjunction with hardware error tolerance Kimmitt, Jonathan R. R. January 2015 (has links) The increasing commoditization of computers in modern society has exceeded the pace of associated developments in reliability. Although theoretical computer science has advanced greatly in the last thirty years, many of the best techniques have yet to find their way into embedded computers, and their failure can have a great potential for disrupting society. This dissertation presents some approaches to improve computer reliability using software and hardware techniques, and makes the following claims for novelty: innovative development of a toolchain and libraries to support extraction from dependent type checking in a theorem prover; conceptual design and deployment in reconfigurable hardware; an extension of static type-safety to hardware description language and FPGA level; elimination of legacy C code from the target and toolchain; a novel hardware error detection scheme is described and compared with conventional triple modular redundancy. The elimination of any user control of memory management promotes robustness against buffer overruns, and consequently prevents vulnerability to common Trojan techniques. The methodology identifies type punning as a key weakness of commonly encountered embedded languages such as C, in particular the extreme difficulty of determining if an array access is in bounds, or if dynamic memory has been properly allocated and released. A method of eliminating dependence on type-unsafe libraries is presented, in conjunction with code that has optionally been proved correct according to user-defined criteria. An appropriately defined subset of OCaml is chosen with support for the Coq theorem prover in mind, and then evaluated with a custom backend that supports behavioural Verilog, as well as a fixed execution unit and associated control store. Results are presented for this alternative platform for reliable embedded systems development that may be used in future industrial flows. To provide assurance of correct operation, the proven software needs to be executed in an environment where errors are checked and corrected in conjunction with appropriate exception processing in the event of an uncorrectable error. Therefore, the present author’s previously published error detection scheme based on dual-rail logic and self-checking checkers is further developed and compared with traditional N-modular redundancy. 004.2
138	Scenarios preprocessing for efficient routing reconfiguration in MPSoC fault tolerance Noc based / PrÃ-processamento de cenÃrios para reconfiguraÃÃo de roteamento eficiente em MPSOC baseado em NoC tolerante a falhas Jarbas Aryel Nunes da Silveira 30 September 2015 (has links) nÃo hÃ / The latest technologies of integrated circuit manufacturing allow billions of transistors to be arranged on a single chip, enabling us to implement a complex parallel system, which requires a communications architecture with high scalability and high degree of parallelism, such as a Network-on-Chip (NoC). These technologies are very close to physical limitations, which increases the quantity of faults in circuit manufacturing and at runtime. Therefore, it is essential to provide a method for fault recovery that would enable the NoC to operate in the presence of faults and still ensure deadlock-free routing. The preprocessing of the most probable fault scenarios allows us to anticipate the calculation of deadlock-free routing, reducing the time that is necessary to interrupt the system during a fault occurrence. This work proposes a technique that employs the preprocessing of fault scenarios based on forecasting fault tendencies, which is performed with a fault threshold circuit operating in agreement with high-level software. The technique encompasses methods for dissimilarity analysis of scenarios based on cross-correlation measurements of fault link matrices, which allow us to define a reduced and efficient set of fault coverage scenarios. Experimental results employing RTL simulation with synthetic traffic prove the quality of the analytic metrics that are used to select the preprocessed scenarios. Furthermore, the experiments show the efficacy and efficiency of the proposed dissimilarity methods, quantifying the latency penalization when using the coverage scenarios approach. / As Ãltimas tecnologias de fabricaÃÃo de circuitos integrados habilitam bilhÃes de transistores a serem postos em um Ãnico chip, permitindo implementar um sistema paralelo complexo, o qual requer uma arquitetura de comunicaÃÃo que tenha grande escalabilidade e alto grau de paralelismo, tal como uma rede intrachip, em inglÃs, Network-on-Chip (NoC). Estas tecnologias estÃo muito prÃximas de limitaÃÃes fÃsicas, aumentando a quantidade de falhas na fabricaÃÃo dos circuitos e em tempo de operaÃÃo. Portanto, Ã essencial fornecer um mÃtodo para recuperaÃÃo de falha que permita a NoC operar na presenÃa de falhas e ainda garantir roteamento livre de deadlock. O prÃ-processamento de cenÃrios de falha mais provÃveis permite antecipar o cÃlculo de rotas livres de deadlock, reduzindo o tempo necessÃrio para interromper o sistema durante a ocorrÃncia de uma falha. Esta tese propÃe uma tÃcnica que emprega o prÃ-processamento de cenÃrios de falha baseado na previsÃo de tendÃncia de falhas, a qual Ã realizada com um circuito de limiar de falha operando em conjunto com um software de alto nÃvel. A tÃcnica contempla anÃlises de mÃtodos de dissimilaridade de cenÃrios baseadas na correlaÃÃo cruzada de matrizes bidimensionais de conexÃes com falha, que permite definir um conjunto reduzido e eficiente de cenÃrios de cobertura de falhas. Resultados experimentais, empregando simulaÃÃo com precisÃo em nÃvel de ciclo e trÃfego sintÃtico, provam a qualidade das mÃtricas analÃticas usadas para selecionar os cenÃrios prÃ-processados. AlÃm do mais, os experimentos mostraram a eficÃcia e eficiÃncia dos mÃtodos de dissimilaridades propostos, quantificando a penalizaÃÃo de latÃncia no uso da abordagem de cenÃrios de cobertura. Topologia irregular TolerÃncia a falhas MÃtodos de roteamento NoC Irregular topology Fault-tolerance Routing methods ENGENHARIA ELETRICA
139	Sistema embarcado para a manutenção inteligente de atuadores elétricos / Embedded systems for intelligent maintenance of electrical actuators Bosa, Jefferson Luiz January 2009 (has links) O elevado custo de manutenção nos ambientes industriais motivou pesquisas de novas técnicas para melhorar as ações de reparos. Com a evolução tecnológica, principalmente da eletrônica, que proporcionou o uso de sistemas embarcados para melhorar as atividades de manutenção, estas agregaram inteligência e evoluíram para uma manutenção pró-ativa. Através de ferramentas de processamento de sinais, inteligência artificial e tolerância a falhas, surgiram novas abordagens para os sistemas de monitoramento a serviço da equipe de manutenção. Os ditos sistemas de manutenção inteligente, cuja tarefa é realizar testes em funcionamento (on-line) nos equipamentos industriais, promovem novos modelos de confiabilidade e disponibilidade. Tais sistemas são baseados nos conceitos de tolerância a falhas, e visam detectar, diagnosticar e predizer a ocorrência de falhas. Deste modo, fornece-se aos engenheiros de manutenção a informação antecipada do estado de comportamento do equipamento antes mesmo deste manifestar uma falha, reduzindo custos, aumentando a vida útil e tornando previsível o reparo. Para o desenvolvimento do sistema de manutenção inteligente objeto deste trabalho, foram estudadas técnicas de inteligência artificial (redes neurais artificiais), técnicas de projeto de sistemas embarcados e de prototipação em plataformas de hardware. No presente trabalho, a rede neural Mapas Auto-Organizáveis foi adotada como ferramenta base para detecção e diagnóstico de falhas. Esta foi prototipada numa plataforma de sistema embarcado baseada na tecnologia FPGA (Field Programmable Gate Array). Como estudo de caso, uma válvula elétrica utilizada em dutos para transporte de petróleo foi definida como aplicação alvo dos experimentos. Através de um modelo matemático, um conjunto de dados representativo do comportamento da válvula foi simulado e utilizado como entrada do sistema proposto. Estes dados visam o treinamento da rede neural e visam fornecer casos de teste para experimentação no sistema. Os experimentos executados em software validaram o uso da rede neural como técnica para detecção e diagnóstico de falhas em válvulas elétricas. Por fim, também realizou-se experimentos a fim de validar o projeto do sistema embarcado, comparando-se os resultado obtidos com este aos resultados obtidos a partir de testes em software. Os resultados revelam a escolha correta do uso da rede neural e o correto projeto do sistema embarcado para desempenhar as tarefas de detecção e diagnóstico de falhas em válvulas elétricas. / The high costs of maintenance in industrial environments have motivated research for new techniques to improve repair activities. The technological progress, especially in the electronics field, has provided for the use of embedded systems to improve repair, by adding intelligence to the system and turning the maintenance a proactive activity. Through tools like signal processing, artificial intelligence and fault-tolerance, new approaches to monitoring systems have emerged to serve the maintenance staff, leading to new models of reliability and availability. The main goal of these systems, also called intelligent maintenance systems, is to perform in-operation (on-line) test of industrial equipments. These systems are built based on fault-tolerance concepts, and used for the detection, the diagnosis and the prognosis of faults. They provide the maintenance engineers with information on the equipment behavior, prior to the occurrence of failures, reducing maintenance costs, increasing the system lifetime and making it possible to schedule repairing stops. To develop the intelligent maintenance system addressed in this dissertation, artificial intelligence (neural networks), embedded systems design and hardware prototyping techniques were studied. In this work, the neural network Self-Organizing Maps (SOM) was defined as the basic tool for the detection and the diagnosis of faults. The SOM was prototyped in an embedded system platform based on the FPGA technology (Field Programmable Gate Array). As a case study, the experiments were performed on an electric valve used in a pipe network for oil transportation. Through a mathematical model, a data set representative of the valve behavior was obtained and used as input to the proposed maintenance system. These data were used for neural network training and also provided test cases for system monitoring. The experiments were performed in software to validate the chosen neural network as the technique for the detection and diagnosis of faults in the electrical valve. Finally, experiments to validate the embedded system design were also performed, so as to compare the obtained results to those resulting from the software tests. The results show the correct choice of the neural network and the correct embedded systems design to perform the activities for the detection and diagnosis of faults in the electrical valve. Microeletrônica Sistemas embarcados Tolerancia : Falhas On-line test Fault-tolerance Self-organizing maps Embedded systems FPGA Maintenance
140	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM

Search results