Global ETD Search

71	Scalable and robust compute capacity multiplexing in virtualized datacenters Kesavan, Mukil 27 August 2014 (has links) Multi-tenant cloud computing datacenters run diverse workloads, inside virtual machines (VMs), with time varying resource demands. Compute capacity multiplexing systems dynamically manage the placement of VMs on physical machines to ensure that their resource demands are always met while simultaneously optimizing on the total datacenter compute capacity being used. In essence, they give the cloud its fundamental property of being able to dynamically expand and contract resources required on-demand. At large scale datacenters though there are two practical realities that designers of compute capacity multiplexing systems need to deal with: (a) maintaining low operational overhead given variable cost of performing management operations necessary to allocate and multiplex resources, and (b) the prevalence of a large number and wide variety of faults in hardware, software and due to human error, that impair multiplexing efficiency. In this thesis we propound the notion that explicitly designing the methods and abstractions used in capacity multiplexing systems for this reality is critical to better achieve administrator and customer goals at large scales. To this end the thesis makes the following contributions: (i) CCM - a hierarchically organized compute capacity multiplexer that demonstrates that simple designs can be highly effective at multiplexing capacity with low overheads at large scales compared to complex alternatives, (ii) Xerxes - a distributed load generation framework for flexibly and reliably benchmarking compute capacity allocation and multiplexing systems, (iii) A speculative virtualized infrastructure management stack that dynamically replicates management operations on virtualized entities, and a compute capacity multiplexer for this environment, that together provide fault-scalable management performance for a broad class of commonly occurring faults in large scale datacenters. Our systems have been implemented in an industry-strength cloud infrastructure built on top of the VMware vSphere virtualization platform and the popular open source OpenStack cloud computing platform running ESXi and Xen hypervisors, respectively. Our experiments have been conducted in a 700 server datacenter using the Xerxes benchmark replaying trace data from production clusters, simulating parameterized scenarios like flash crowds, and also using a suite of representative cloud applications. Results from these scenarios demonstrate the effectiveness of our design techniques in real-life large scale environments. Distributed systems Virtualization Resource management Fault tolerance Function replication Benchmarking
72	Design and simulation of advanced fault tolerant flight control schemes Gururajan, Srikanth. January 2006 (has links) Thesis (Ph. D.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains xii, 132 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 123-128).
73	LCL DC/DC converter and DC hub under DC faults and development of DC grids with protection system using DC hub Zhang, Jianxi January 2016 (has links) In this thesis, an IGBT-based DC/DC converter employing an internal inductor-capacitor-inductor (LCL) passive circuit is investigated in DC grid under fault conditions. It is concluded that a range of converter parameters exist which will give DC fault current magnitudes close to rated currents. Steady state and transient fault responses are investigated in depth. The converter is modelled on PSCAD platform under fault operation and the simulation results verify the analytical studies. LCL DC hub is an extension of DC/DC converter to multiple ports with capability of limiting the propagation of DC faults in a DC grid. Analytical mathematical equations for steady state fault currents are derived. A state space model of the hub is introduced for transient fault study. The hub is able to interconnect multiple DC cables at different voltage levels and act as DC substation for DC grid. The designed hub also has the ability to maintain the current within the order of its rated value without additional protection even for the worst case fault. The analytical study results are confirmed by detailed simulation on PSCAD. Based on the good performance of the LCL DC hub under DC faults, a DC grid topology with protection system employing LCL DC hub is proposed and investigated in this thesis. The advantage and feasibility of this method in DC fault protection is investigated based on the developed grid model. The DC grid protection systems are proposed and analysed in depth under several DC fault scenarios. The PSCAD simulation results under a range of DC fault scenarios on various locations are shown. These results confirm significance of the proposed DC grid protection system and advantages of this proposed topology in fault isolation. 621.31
74	Maresia : an approach to deal with the single points of failure of the MapReduce model / Maresi: uma abordagem para lidar com os pontos de falha única do modelo MapReduce Marcos, Pedro de Botelho January 2013 (has links) Durante os últimos anos, a quantidade de dados gerada pelas aplicações cresceu consideravelmente. No entanto, para tornarem-se relevantes estes dados precisam ser processados. Para atender este objetivo, novos modelos de programação para processamento paralelo e distribuído foram propostos. Um exemplo é o modelo MapReduce, o qual foi proposto pela Google. Este modelo, no entanto, possui pontos de falha única (SPOF), os quais podem comprometer a sua execução. Assim, este trabalho apresenta uma nova arquitetura, inspirada pelo Chord, para lidar com os SPOFs do modelo. A avaliação da proposta foi realizada através de modelagem analítica e de testes experimentais. Os resultados mostram a viabilidade de usar a arquitetura proposta para executar o MapReduce. / During the last years, the amount of data generated by applications grew considerably. To become relevant, however, this data should be processed. With this goal, new programming models for parallel and distributed processing were proposed. An example is the MapReduce model, which was proposed by Google. This model, nevertheless, has Single Points of Failure (SPOF), which can compromise the execution of a job. Thus, this work presents a new architecture, inspired by Chord, to avoid the SPOFs on MapReduce. The evaluation was performed through an analytical model and an experimental setup. The results show the feasibility of using the proposed architecture to execute MapReduce jobs. Sistemas operacionais P2P Distributed systems Mapreduce Fault tolerance P2P
75	Exploração adaptativa de paralelismo sob restrições físicas e de tempo real em sistemas embarcados tolerantes a falhas / Adaptive parallelism exploitation under physical and real-time constraints for fault tolerant embedded systems Itturriet, Fabio Pires January 2012 (has links) A constante redução nas dimensões dos transistores foi o principal combustível capaz de manter o crescente desempenho exigido por aplicações. Ao mesmo tempo, as tensões de alimentação dos circuitos também são reduzidas a cada novo nó tecnológico, fazendo com que partículas como nêutrons e partículas alpha, portando quantidades de energia cada vez menores sejam capazes de gerar os chamados soft errors, que impactam diretamente na redução da confiabilidade dos sistemas embarcados atuais. Isto faz com que a implementação de técnicas de tolerância a falhas se tornem praticamente obrigatórias para tecnologias atuais e futuras. Estes mesmos sistemas embarcados, como smartphones, devem apresentar alto poder de processamento, visando atender um crescente conjunto de aplicações de natureza heterogênea, consumindo a mínima potência possível. Nestes sistemas, algumas dessas principais aplicações como codec GSM, cancelamento de eco acústico, processamento de áudio e vídeo apresentam em comum a necessidade de multiplicar matrizes de diferentes dimensões em determinados intervalos de tempo. Pensando nestas demandas, será proposta a arquitetura RA3, cujo objetivo é executar o algoritmo de multiplicação de matrizes em paralelo com a técnica de tolerância a falhas conhecida na literatura como ABFT, visando a aumentar a confiabilidade da mesma. Além disso, a RA3 possui uma estrutura adaptativa que permite que unidades internas como memórias, multiplicadores e somadores sejam ligadas ou desligadas através da aplicação da técnica de power gating em tempo de execução, conforme restrições impostas pela largura da banda de memória, power budgets e deadlines impostos por aplicações de tempo real, visando executar tarefas consumindo a mínima potência possível. Para avaliar as funcionalidades propostas, dois estudos de caso reais são apresentados e o comportamento da arquitetura é avaliado sobre diversos aspectos como desempenho, área, consumo de potência e cobertura de falhas. Finalmente é possível comprovar que a adaptabilidade proposta pela arquitetura RA3 permite que seja encontrada, em diversos cenários, a quantidade exata de recursos necessários para executar determinadas aplicações sem comprometer as restrições impostas principalmente no consumo de potência e por aplicações com deadlines críticos, mantendo ainda altas taxas de cobertura de falhas. / The continuous reduction of transistors’ dimensions was the main drive capable of maintaining the performance increase required by applications. At the same time, supply voltages of the circuits are also reduced with each new technology node, causing particles such as neutrons or alpha particles, even with reduced amounts of energy, to generate so-called soft errors that directly impact on the reliability of embedded systems. This scenario makes the implementation of techniques for fault tolerance mandatory for current and future technologies. Still, embedded systems, such as smartphones, must provide high processing power to execute a growing set of applications of heterogeneous nature, consuming the least possible power. In these systems, applications like GSM codec, acoustic echo cancellation, audio and video processing have in common the need for matrix multiplication operations of different dimensions at certain time intervals. To efficiently support the aforementioned scenario, this dissertation proposes the RA3 architecture whose goal is run the matrix multiplication algorithm in parallel with the fault tolerance technique know in the literature as ABFT, aiming to support software execution with high reliability. Furthermore, the RA3 architecture provides adaptive internal units such as memories, multipliers and adders with adaptive powering on or off by applying power gating at runtime. Runtime power gating enables to meet restrictions imposed by real-time applications or memory bandwidth with minimum power. To evaluate the proposed architecture, two case studies are presented and the behavior of the architecture is evaluated in terms of performance, area, power consumption and fault coverage. Finally, a comprehensive design space exploration shows that the adaptability provided by the RA3 architecture allows the system designer to find, in many scenarios, the exact amount of resources needed to run a set of applications without compromising the restrictions imposed mainly in power consumption and real-time deadlines, while still maintaining a high fault coverage rate. Microeletrônica Sistemas embarcados Adaptive architectures Fault tolerance Embedded systems
76	Desenvolvimento e teste de um monitor de barramento I2C para proteção contra falhas transientes / Development and test of an I2C bus monitor for protection against transient faults Carvalho, Vicente Bueno January 2016 (has links) A comunicação entre circuitos integrados tem evoluído em desempenho e confiabilidade ao longo dos anos. Inicialmente os projetos utilizavam barramentos paralelos, onde existe a necessidade de uma grande quantidade de vias, utilizando muitos pinos de entrada e saída dos circuitos integrados resultando também em uma grande suscetibilidade a interferências eletromagnéticas (EMI) e descargas eletrostáticas (ESD). Na sequência, ficou claro que o modelo de barramento serial possuía ampla vantagem em relação ao predecessor, uma vez que este utiliza um menor número de vias, facilitando o processo de leiaute de placas, facilitando também a integridade de sinais possibilitando velocidades muito maiores apesar do menor número de vias. Este trabalho faz uma comparação entre os principais protocolos seriais de baixa e média velocidade. Nessa pesquisa, foram salientadas as características positivas e negativas de cada protocolo, e como resultado o enquadramento de cada um dos protocolos em um segmento de atuação mais apropriado. O objetivo deste trabalho é utilizar o resultado da análise comparativa dos protocolos seriais para propor um aparato de hardware capaz de suprir uma deficiência encontrada no protocolo serial I2C, amplamente utilizado na indústria, mas que possui restrições quando a aplicação necessita alta confiabilidade. O aparato, aqui chamado de Monitor de Barramento I2C, é capaz de verificar a integridade de dados, sinalizar métricas sobre a qualidade das comunicações, detectar falhas transitórias e erros permanentes no barramento e agir sobre os dispositivos conectados ao barramento para a recuperação de tais erros, evitando falhas. Foi desenvolvido um mecanismo de injeção de falhas para simular as falhas em dispositivos conectados ao barramento e, portanto, verificar a resposta do monitor. Resultados no PSoC5, da empresa Cypress, mostram que a solução proposta tem um baixo custo em termos de área e nenhum impacto no desempenho das comunicações. / The communication between integrated circuits has evolved in performance and reliability over the years. Initially projects used parallel buses, where there is a need for a large amount of wires, consuming many input and output pins of the integrated circuits resulting in a great susceptibility to electromagnetic interference (EMI) and electrostatic discharge (ESD). As a result, it became clear that the serial bus model had large advantage over predecessor, since it uses a smaller number of lanes, making the PCB layout process easier, which also facilitates the signal integrity allowing higher speeds despite fewer pathways. This work makes a comparison between the main low and medium speed serial protocols. The research has emphasized the positive and negative characteristics of each protocol, and as a result the framework of each of the protocols in a more appropriate market segment. The objective of this work is to use the results of comparative analysis of serial protocols to propose a hardware apparatus capable of filling a gap found in the I2C protocol, widely used in industry, but with limitations when the application requires high reliability. The apparatus, here called I2C Bus Monitor, is able to perform data integrity verification activities, to signalize metrics about the quality of communications, to detect transient faults and permanent errors on the bus and to act on the devices connected to the bus for the recovery of such errors avoiding failures. It was developed a fault injection mechanism to simulate faults in the devices connected to the bus and thus verify the monitor response. Results in the APSoC5 from Cypress show that the proposed solution has an extremely low cost overhead in terms of area and no performance impact in the communication. Microeletrônica Tolerancia : Falhas I2C protocol Fault tolerance Aerospace APSoC PsoC
77	Coping with permanent faults in NoCs by using adaptive strategies based on router design-level and routing algorithm-level / Cobrindo falhas permanentes em Redes intrachip usando técnicas adaptativas nos roteadores em um nível de projeto e em um nível de algoritmo Concatto, Caroline Martins January 2009 (has links) Hoje em dia, as redes intra chip (NoC) são cada vez mais utilizadas como uma arquitetura de comunicação alternativa para sistemas complexos, pois estas permitem flexibilidade e desempenho da comunicação. Porém, o grande número de interconexões da rede, aliado à diminuição das dimensões dos transistores fabricados nas tecnologias nanométricas, fazem com que a NoC possa ter um grande número de falhas durante sua fabricação, ou por desgaste durante sua vida útil. Sabe-se que, em futuras tecnologias os circuitos integrados terão uma taxa de falhas permanentes de 20 a 30%. Entretanto, mesmo na presença de falhas, é desejável que a NoC permaneça funcionando corretamente. A partir do diagnóstico das falhas, a NoC deve ser capaz de buscar alternativas para manter a comunicação entre os núcleos, evitando os canais e os roteadores com falhas. O objetivo deste trabalho é propor mecanismos adaptativos de proteção contra falhas permanentes. Mesmo quando são adicionados componentes extras para a substituição em SoCs, a ocorrência de falhas permanentes na rede intrachip impede a substituição ou reparo de um componente no sistema intrachip. Portanto a tolerância a falhas na NoC será crucial para reduzir custo de manufatura, e aumentar o rendimento e o tempo de vida do circuito integrado. O mecanismo proposto é capaz de evitar falhas sabendo anteriormente, na fase de teste e diagnóstico, a localização especifica da falha. Portanto, as técnicas se adaptam em cada roteador para evitar as falhas permanentes, sempre buscando manter desempenho, aumentar o rendimento e a confiabilidade do sistema. / Nowadays, networks-on-chip (NoCs) have been used as an alternative communication architecture inside complex system on-chip. They offer better scalability and performance than the traditional bus. However, the growing number of interconnects that have to be inserted using smaller transistors means that NoCs have a growing number of faults, either from manufacturing or due to aging. In future systems-on-chip (SoCs), the fault rate will be around 20 to 30% of the contact and transistors of integrated circuits. Therefore, even in the presence of a fault, it is still desirable that NoCs properly work. The main idea of this work is to implement adaptive mechanisms to protect NoCs against permanent faults. The main advantage of such mechanism is to manage failures based on data from the testing and diagnosing phase. The mechanisms are adapted in each router in order to sustain performance, increasing the system yield and reliability even in the presence of failures. Even if one adds extra blocks for replacement, the occurrence of permanent faults in a NoC might preclude the replacement or repair of a faulty component within the SoC. In such case, fault-tolerant NoCs are able to reduce manufacturing costs, increase yield and the lifetime of the chip. Microeletrônica Tolerancia : Falhas NoC Fault-tolerance Adaptability Performance Microeletronics
78	Radiation robustness of XOR and majority voter circuits at finFET technology under variability Aguiar, Ygor Quadros de January 2017 (has links) Os avanços na microeletrônica contribuíram para a redução de tamanho do nó tecnológico, diminuindo a tensão de limiar e aumentando a freqüência de operação dos sistemas. Embora tenha resultado em ganhos positivos relacionados ao desempenho e ao consumo de energia dos circuitos VLSI, a miniaturização também tem um impacto negativo em termos de confiabilidade dos projetos. À medida que a tecnologia diminui, os circuitos estão se tornando mais suscetíveis a inúmeros efeitos devido à redução da robustez ao ruído externo, bem como ao aumento do grau de incerteza relacionado às muitas fontes de variabilidade. As técnicas de tolerancia a falhas geralmente são usadas para melhorar a robustez das aplicações de segurança crítica. No entanto, as implicações da redução da tecnologia interferem na eficácia de tais abordagem em fornecer a cobertura de falhas desejada. Por esse motivo, este trabalho avaliou a robustez aos efeitos de radiação de diferentes circuitos projetados na tecnologia FinFET sob efeitos de variabilidade. Para determinar as melhores opções de projeto para implementar técnicas de tolerancia a falhas, como os esquemas de Redundância de módulo triplo (TMR) e/ou duplicação com comparação (DWC), o conjunto de circuitos analisados é composto por dez diferentes topologias de porta lógica OR-exclusivo (XOR) e dois circuitos votadores maioritários (MJV). Para investigar o efeito da configuração do gate dos dispositivos FinFET, os circuitos XOR são analisados usando a configuração de double-gate (DG FinFET) e tri-gate (TG FinFET). A variabilidade ambiental, como variabilidade de temperatura e tensão, são avaliadas no conjunto de circuitos analisados. Além disso, o efeito da variabilidade de processo Work-Function Fluctuation (WFF) também é avaliado. A fim de fornecer um estudo mais preciso, o projeto do leiaute dos circuitos MJV usando 7nm FinFET PDK é avaliado pela ferramenta preditiva MUSCA SEP3 para estimar o Soft-Error Rate (SER) dos circuitos considerando as características do leiaute e as camadas de Back-End-Of-Line (BEOL) e Front-End-Of-Line (FEOL) de um nó tecnológico avançado. / Advances in microelectronics have contributed to the size reduction of the technological node, lowering the threshold voltage and increasing the operating frequency of the systems. Although it has positive outcomes related to the performance and power consumption of VLSI circuits, it does also have a strong negative impact in terms of the reliability of designs. As technology scales down, the circuits are becoming more susceptible to numerous effects due to the reduction of robustness to external noise as well as the increase of uncertainty degree related to the many sources of variability. Faulttolerant techniques are usually used to improve the robustness of safety critical applications. However, the implications of the scaling of technology have interfered against the effectiveness of fault-tolerant approaches to provide the fault coverage. For this reason, this work has evaluated the radiation robustness of different circuits designed in FinFET technology under variability effects. In order to determine the best design options to implement fault-tolerant techniques such as the Triple-Module Redundancy (TMR) and/or Duplication with Comparison (DWC) schemes, the set of analyzed circuits is composed of ten different exclusive-OR (XOR) logic gate topologies and two majority voter (MJV) circuits. To investigate the effect of gate configuration of FinFET devices, the XOR circuits is analyzed using double-gate configuration (DG FinFET) and tri-gate configuration (TG FinFET). Environmental Variability such as Temperature and Voltage Variability are evaluated in the set of analyzed circuits. Additionally, the process-related variability effect Work-Function Fluctuation (WFF) is also evaluated. In order to provide a more precise study, the layout design of the MJV circuits using a 7nm FinFET PDK is evaluated by the predictive MUSCA SEP3 tool to estimate the Soft-Error Rate (SER) of the circuits considering the layout contrainsts and Back-End-Of-Line (BEOL) and Front-End-Of-Line (FEOL) layers of an advanced technology node. Microeletrônica Circuitos digitais Microelectronics FinFET Variability Radiation Effects Fault Tolerance
79	Desenvolvimento e teste de um monitor de barramento I2C para proteção contra falhas transientes / Development and test of an I2C bus monitor for protection against transient faults Carvalho, Vicente Bueno January 2016 (has links) A comunicação entre circuitos integrados tem evoluído em desempenho e confiabilidade ao longo dos anos. Inicialmente os projetos utilizavam barramentos paralelos, onde existe a necessidade de uma grande quantidade de vias, utilizando muitos pinos de entrada e saída dos circuitos integrados resultando também em uma grande suscetibilidade a interferências eletromagnéticas (EMI) e descargas eletrostáticas (ESD). Na sequência, ficou claro que o modelo de barramento serial possuía ampla vantagem em relação ao predecessor, uma vez que este utiliza um menor número de vias, facilitando o processo de leiaute de placas, facilitando também a integridade de sinais possibilitando velocidades muito maiores apesar do menor número de vias. Este trabalho faz uma comparação entre os principais protocolos seriais de baixa e média velocidade. Nessa pesquisa, foram salientadas as características positivas e negativas de cada protocolo, e como resultado o enquadramento de cada um dos protocolos em um segmento de atuação mais apropriado. O objetivo deste trabalho é utilizar o resultado da análise comparativa dos protocolos seriais para propor um aparato de hardware capaz de suprir uma deficiência encontrada no protocolo serial I2C, amplamente utilizado na indústria, mas que possui restrições quando a aplicação necessita alta confiabilidade. O aparato, aqui chamado de Monitor de Barramento I2C, é capaz de verificar a integridade de dados, sinalizar métricas sobre a qualidade das comunicações, detectar falhas transitórias e erros permanentes no barramento e agir sobre os dispositivos conectados ao barramento para a recuperação de tais erros, evitando falhas. Foi desenvolvido um mecanismo de injeção de falhas para simular as falhas em dispositivos conectados ao barramento e, portanto, verificar a resposta do monitor. Resultados no PSoC5, da empresa Cypress, mostram que a solução proposta tem um baixo custo em termos de área e nenhum impacto no desempenho das comunicações. / The communication between integrated circuits has evolved in performance and reliability over the years. Initially projects used parallel buses, where there is a need for a large amount of wires, consuming many input and output pins of the integrated circuits resulting in a great susceptibility to electromagnetic interference (EMI) and electrostatic discharge (ESD). As a result, it became clear that the serial bus model had large advantage over predecessor, since it uses a smaller number of lanes, making the PCB layout process easier, which also facilitates the signal integrity allowing higher speeds despite fewer pathways. This work makes a comparison between the main low and medium speed serial protocols. The research has emphasized the positive and negative characteristics of each protocol, and as a result the framework of each of the protocols in a more appropriate market segment. The objective of this work is to use the results of comparative analysis of serial protocols to propose a hardware apparatus capable of filling a gap found in the I2C protocol, widely used in industry, but with limitations when the application requires high reliability. The apparatus, here called I2C Bus Monitor, is able to perform data integrity verification activities, to signalize metrics about the quality of communications, to detect transient faults and permanent errors on the bus and to act on the devices connected to the bus for the recovery of such errors avoiding failures. It was developed a fault injection mechanism to simulate faults in the devices connected to the bus and thus verify the monitor response. Results in the APSoC5 from Cypress show that the proposed solution has an extremely low cost overhead in terms of area and no performance impact in the communication. Microeletrônica Tolerancia : Falhas I2C protocol Fault tolerance Aerospace APSoC PsoC
80	Maresia : an approach to deal with the single points of failure of the MapReduce model / Maresi: uma abordagem para lidar com os pontos de falha única do modelo MapReduce Marcos, Pedro de Botelho January 2013 (has links) Durante os últimos anos, a quantidade de dados gerada pelas aplicações cresceu consideravelmente. No entanto, para tornarem-se relevantes estes dados precisam ser processados. Para atender este objetivo, novos modelos de programação para processamento paralelo e distribuído foram propostos. Um exemplo é o modelo MapReduce, o qual foi proposto pela Google. Este modelo, no entanto, possui pontos de falha única (SPOF), os quais podem comprometer a sua execução. Assim, este trabalho apresenta uma nova arquitetura, inspirada pelo Chord, para lidar com os SPOFs do modelo. A avaliação da proposta foi realizada através de modelagem analítica e de testes experimentais. Os resultados mostram a viabilidade de usar a arquitetura proposta para executar o MapReduce. / During the last years, the amount of data generated by applications grew considerably. To become relevant, however, this data should be processed. With this goal, new programming models for parallel and distributed processing were proposed. An example is the MapReduce model, which was proposed by Google. This model, nevertheless, has Single Points of Failure (SPOF), which can compromise the execution of a job. Thus, this work presents a new architecture, inspired by Chord, to avoid the SPOFs on MapReduce. The evaluation was performed through an analytical model and an experimental setup. The results show the feasibility of using the proposed architecture to execute MapReduce jobs. Sistemas operacionais P2P Distributed systems Mapreduce Fault tolerance P2P

Search results