Global ETD Search

61	Generalized Consensus for Practical Fault-Tolerance Garg, Mohit 07 September 2018 (has links) Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols. / Master of Science / Online services like Facebook, Twitter, Netflix and Spotify to cloud services like Google and Amazon serve millions of users which include individuals as well as organizations. They use many distributed technologies to deliver a rich experience. The distributed nature of these technologies has removed geographical barriers to accessing data, services, software, and hardware. An essential aspect of these technologies is the concept of the shared state. Distributed databases with multiple replicated data nodes are an example of this shared state. Maintaining replicated data nodes provides several advantages such as (1) availability so that in case one node goes down the data can still be accessed from other nodes, (2) quick response times, by placing data nodes closer to the user, the data can be obtained quickly, (3) scalability by enabling multiple users to access different nodes so that a single node does not cause bottlenecks. To maintain this shared state some mechanism is required to maintain consistency, that is the copies of these shared state must be identical on all the data nodes. This mechanism is called Consensus, and several such mechanisms exist in practice today which use the Crash Fault Tolerance (CFT). The CFT model implies that these mechanisms provide consistency in the presence of nodes crashing. While the state-of-the-art for security has moved from assuming a trusted environment inside a firewall to a perimeter-less and semi-trusted environment with every service living on the internet, only the application layer is required to be secured while the core is built just with an idea of crashes in mind. While there exists comprehensive research on secure Consensus mechanisms which utilize what is called the Byzantine Fault Tolerance (BFT) model, the extra costs required to implement these mechanisms and comparatively lower performance in a geographically distributed setting has impeded widespread adoption. A new model recently proposed tries to find a cross between these models that is achieving security while paying no extra costs called the Cross Fault Tolerance (XFT). This thesis presents Elpis, a consensus mechanism which uses precisely this model that will secure the shared state from its core without modifications to the existing setups while delivering high performance and lower response times. We perform a comprehensive evaluation on AWS and demonstrate that Elpis achieves 3.5x over the state-of-the-art while improving response times by as much as 50%. Distributed Systems Fault Tolerance Byzantine Fault Tolerance State Machine Replication Multi-Leader Consensus Blockchain
62	Exploring scaling limits and computational paradigms for next generation embedded systems Zykov, Andrey V. 01 June 2010 (has links) It is widely recognized that device and interconnect fabrics at the nanoscale will be characterized by a higher density of permanent defects and increased susceptibility to transient faults. This appears to be intrinsic to nanoscale regimes and fundamentally limits the eventual benefits of the increased device density, i.e., the overheads associated with achieving fault-tolerance may counter the benefits of increased device density -- density-reliability tradeoff. At the same time, as devices scale down one can expect a higher proportion of area to be associated with interconnection, i.e., area is wire dominated. In this work we theoretically explore density-reliability tradeoffs in wire dominated integrated systems. We derive an area scaling model based on simple assumptions capturing the salient features of hierarchical design for high performance systems, along with first order assumptions on reliability, wire area, and wire length across hierarchical levels. We then evaluate overheads associated with using basic fault-tolerance techniques at different levels of the design hierarchy. This, albeit simplified model, allows us to tackle several interesting theoretical questions: (1) When does it make sense to use smaller less reliable devices? (2) At what scale of the design hierarchy should fault tolerance be applied in high performance integrated systems? In the second part of this thesis we explore perturbation-based computational models as a promising choice for implementing next generation ubiquitous information technology on unreliable nanotechnologies. We show the inherent robustness of such computational models to high defect densities and performance uncertainty which, when combined with low manufacturing precision requirements, makes them particularly suitable for emerging nanoelectronics. We propose a hybrid eNano-CMOS perturbation-based computing platform relying on a new style of configurability that exploits the computational model's unique form of unstructured redundancy. We consider the practicality and scalability of perturbation-based computational models by developing and assessing initial foundations for engineering such systems. Specifically, new design and decomposition principles exploiting task specific contextual and temporal scales are proposed and shown to substantially reduce complexity for several benchmark tasks. Our results provide strong evidence for the relevance and potential of this class of computational models when targeted at emerging unreliable nanoelectronics. / text Integrated systems Reliability Fault tolerance Nanoscale devices Nanotechnology Computational models
63	Determine network survivability using heuristic models Chua, Eng Hong 03 1900 (has links) Approved for public release; distribution in unlimited. / Contemporary large-scale networked systems have improved the efficiency and effectiveness of our way of life. However, such benefit is accompanied by elevated risks of intrusion and compromises. Incorporating survivability capabilities into systems is one of the ways to mitigate these risks. The Server Agent-based Active network Management (SAAM) project was initiated as part of the next generation Internet project to address the increasing multi-media Internet service demands. Its objective is to provide a consistent and dedicated quality of service to the users. SAAM monitors the network traffic conditions in a region and responds to routing requests from the routers in that region with optimal routes. Mobility has been incorporated to SAAM server to prevent a single point of failure from bringing down the entire SAAM server and its service. With mobility, it is very important to select a good SAAM server locality from the client's point of view. The choice of the server must be a node where connection to the client is most survivable. In order to do that, a general metric is defined to measure the connection survivability of each of the potential server hosts. However, due to the complexity of the network, the computation of the metric becomes very complex too. This thesis develops heuristic solutions of polynomial complexity to find the hosting server node. In doing so, it minimizes the time and computer power required. / Defence Science & Technology Agency (Singapore) Computer networks Reliability Fault Tolerance Network Reliability Survivability Server Placement
64	Radiation robustness of XOR and majority voter circuits at finFET technology under variability Aguiar, Ygor Quadros de January 2017 (has links) Os avanços na microeletrônica contribuíram para a redução de tamanho do nó tecnológico, diminuindo a tensão de limiar e aumentando a freqüência de operação dos sistemas. Embora tenha resultado em ganhos positivos relacionados ao desempenho e ao consumo de energia dos circuitos VLSI, a miniaturização também tem um impacto negativo em termos de confiabilidade dos projetos. À medida que a tecnologia diminui, os circuitos estão se tornando mais suscetíveis a inúmeros efeitos devido à redução da robustez ao ruído externo, bem como ao aumento do grau de incerteza relacionado às muitas fontes de variabilidade. As técnicas de tolerancia a falhas geralmente são usadas para melhorar a robustez das aplicações de segurança crítica. No entanto, as implicações da redução da tecnologia interferem na eficácia de tais abordagem em fornecer a cobertura de falhas desejada. Por esse motivo, este trabalho avaliou a robustez aos efeitos de radiação de diferentes circuitos projetados na tecnologia FinFET sob efeitos de variabilidade. Para determinar as melhores opções de projeto para implementar técnicas de tolerancia a falhas, como os esquemas de Redundância de módulo triplo (TMR) e/ou duplicação com comparação (DWC), o conjunto de circuitos analisados é composto por dez diferentes topologias de porta lógica OR-exclusivo (XOR) e dois circuitos votadores maioritários (MJV). Para investigar o efeito da configuração do gate dos dispositivos FinFET, os circuitos XOR são analisados usando a configuração de double-gate (DG FinFET) e tri-gate (TG FinFET). A variabilidade ambiental, como variabilidade de temperatura e tensão, são avaliadas no conjunto de circuitos analisados. Além disso, o efeito da variabilidade de processo Work-Function Fluctuation (WFF) também é avaliado. A fim de fornecer um estudo mais preciso, o projeto do leiaute dos circuitos MJV usando 7nm FinFET PDK é avaliado pela ferramenta preditiva MUSCA SEP3 para estimar o Soft-Error Rate (SER) dos circuitos considerando as características do leiaute e as camadas de Back-End-Of-Line (BEOL) e Front-End-Of-Line (FEOL) de um nó tecnológico avançado. / Advances in microelectronics have contributed to the size reduction of the technological node, lowering the threshold voltage and increasing the operating frequency of the systems. Although it has positive outcomes related to the performance and power consumption of VLSI circuits, it does also have a strong negative impact in terms of the reliability of designs. As technology scales down, the circuits are becoming more susceptible to numerous effects due to the reduction of robustness to external noise as well as the increase of uncertainty degree related to the many sources of variability. Faulttolerant techniques are usually used to improve the robustness of safety critical applications. However, the implications of the scaling of technology have interfered against the effectiveness of fault-tolerant approaches to provide the fault coverage. For this reason, this work has evaluated the radiation robustness of different circuits designed in FinFET technology under variability effects. In order to determine the best design options to implement fault-tolerant techniques such as the Triple-Module Redundancy (TMR) and/or Duplication with Comparison (DWC) schemes, the set of analyzed circuits is composed of ten different exclusive-OR (XOR) logic gate topologies and two majority voter (MJV) circuits. To investigate the effect of gate configuration of FinFET devices, the XOR circuits is analyzed using double-gate configuration (DG FinFET) and tri-gate configuration (TG FinFET). Environmental Variability such as Temperature and Voltage Variability are evaluated in the set of analyzed circuits. Additionally, the process-related variability effect Work-Function Fluctuation (WFF) is also evaluated. In order to provide a more precise study, the layout design of the MJV circuits using a 7nm FinFET PDK is evaluated by the predictive MUSCA SEP3 tool to estimate the Soft-Error Rate (SER) of the circuits considering the layout contrainsts and Back-End-Of-Line (BEOL) and Front-End-Of-Line (FEOL) layers of an advanced technology node. Microeletrônica Circuitos digitais Microelectronics FinFET Variability Radiation Effects Fault Tolerance
65	Desenvolvimento e teste de um monitor de barramento I2C para proteção contra falhas transientes / Development and test of an I2C bus monitor for protection against transient faults Carvalho, Vicente Bueno January 2016 (has links) A comunicação entre circuitos integrados tem evoluído em desempenho e confiabilidade ao longo dos anos. Inicialmente os projetos utilizavam barramentos paralelos, onde existe a necessidade de uma grande quantidade de vias, utilizando muitos pinos de entrada e saída dos circuitos integrados resultando também em uma grande suscetibilidade a interferências eletromagnéticas (EMI) e descargas eletrostáticas (ESD). Na sequência, ficou claro que o modelo de barramento serial possuía ampla vantagem em relação ao predecessor, uma vez que este utiliza um menor número de vias, facilitando o processo de leiaute de placas, facilitando também a integridade de sinais possibilitando velocidades muito maiores apesar do menor número de vias. Este trabalho faz uma comparação entre os principais protocolos seriais de baixa e média velocidade. Nessa pesquisa, foram salientadas as características positivas e negativas de cada protocolo, e como resultado o enquadramento de cada um dos protocolos em um segmento de atuação mais apropriado. O objetivo deste trabalho é utilizar o resultado da análise comparativa dos protocolos seriais para propor um aparato de hardware capaz de suprir uma deficiência encontrada no protocolo serial I2C, amplamente utilizado na indústria, mas que possui restrições quando a aplicação necessita alta confiabilidade. O aparato, aqui chamado de Monitor de Barramento I2C, é capaz de verificar a integridade de dados, sinalizar métricas sobre a qualidade das comunicações, detectar falhas transitórias e erros permanentes no barramento e agir sobre os dispositivos conectados ao barramento para a recuperação de tais erros, evitando falhas. Foi desenvolvido um mecanismo de injeção de falhas para simular as falhas em dispositivos conectados ao barramento e, portanto, verificar a resposta do monitor. Resultados no PSoC5, da empresa Cypress, mostram que a solução proposta tem um baixo custo em termos de área e nenhum impacto no desempenho das comunicações. / The communication between integrated circuits has evolved in performance and reliability over the years. Initially projects used parallel buses, where there is a need for a large amount of wires, consuming many input and output pins of the integrated circuits resulting in a great susceptibility to electromagnetic interference (EMI) and electrostatic discharge (ESD). As a result, it became clear that the serial bus model had large advantage over predecessor, since it uses a smaller number of lanes, making the PCB layout process easier, which also facilitates the signal integrity allowing higher speeds despite fewer pathways. This work makes a comparison between the main low and medium speed serial protocols. The research has emphasized the positive and negative characteristics of each protocol, and as a result the framework of each of the protocols in a more appropriate market segment. The objective of this work is to use the results of comparative analysis of serial protocols to propose a hardware apparatus capable of filling a gap found in the I2C protocol, widely used in industry, but with limitations when the application requires high reliability. The apparatus, here called I2C Bus Monitor, is able to perform data integrity verification activities, to signalize metrics about the quality of communications, to detect transient faults and permanent errors on the bus and to act on the devices connected to the bus for the recovery of such errors avoiding failures. It was developed a fault injection mechanism to simulate faults in the devices connected to the bus and thus verify the monitor response. Results in the APSoC5 from Cypress show that the proposed solution has an extremely low cost overhead in terms of area and no performance impact in the communication. Microeletrônica Tolerancia : Falhas I2C protocol Fault tolerance Aerospace APSoC PsoC
66	Improving the performance of railway track-switching through the introduction of fault tolerance Bemment, Samuel D. January 2018 (has links) In the future, the performance of the railway system must be improved to accommodate increasing passenger volumes and service quality demands. Track switches are a vital part of the rail infrastructure, enabling traffic to take different routes. All modern switch designs have evolved from a design first patented in 1832. However, switches present single points of failure, require frequent and costly maintenance interventions, and restrict network capacity. Fault tolerance is the practice of preventing subsystem faults propagating to whole-system failures. Existing switches are not considered fault tolerant. This thesis describes the development and potential performance of fault-tolerant railway track switching solutions. The work first presents a requirements definition and evaluation framework which can be used to select candidate designs from a range of novel switching solutions. A candidate design with the potential to exceed the performance of existing designs is selected. This design is then modelled to ascertain its practical feasibility alongside potential reliability, availability, maintainability and capacity performance. The design and construction of a laboratory scale demonstrator of the design is described. The modelling results show that the performance of the fault tolerant design may exceed that of traditional switches. Reliability and availability performance increases significantly, whilst capacity gains are present but more marginal without the associated relaxation of rules regarding junction control. However, the work also identifies significant areas of future work before such an approach could be adopted in practice.
67	Assembly tolerance analysis in geometric dimensioning and tolerancing Tangkoonsombati, Choowong 25 August 1994 (has links) Tolerance analysis is a major link between design and manufacturing. An assembly or a part should be designed based on its functions, manufacturing processes, desired product quality, and manufacturing cost. Assembly tolerance analysis performed at the design stage can reduce potential manufacturing and assembly problems. Several commonly used assembly tolerance analysis models and their limitations are reviewed in this research. Also, a new assembly tolerance analysis model is developed to improve the limitations of the existing assembly tolerance analysis models. The new model elucidates the impact of the flatness symbol (one of the Geometric Dimensioning and Tolerancing (GD&T) specification symbols) and reduces design variables into simple mathematical equations. The new model is based on beta distribution of part dimensions. In addition, a group of manufacturing variables, including quality factor, process tolerance, and mean shift, is integrated in the new assembly tolerance analysis model. A computer integrated system has been developed to handle four support systems for the performance of tolerance analysis in a single computer application. These support systems are: 1) the CAD drawing system, 2) the Geometric Dimensioning and Tolerancing (GD&T) specification system, 3) the assembly tolerance analysis model, and 4) the tolerance database operating under the Windows environment. Dynamic Data Exchange (DDE) is applied to exchange the data between two different window applications, resulting in improvement of information transfer between the support systems. In this way, the user is able to use this integrated system to select a GD&T specification, determine a critical assembly dimension and tolerance, and access the tolerance database during the design stage simultaneously. Examples are presented to illustrate the application of the integrated tolerance analysis system. / Graduation date: 1995 Quality control -- Statistical methods
68	The limits of network transparency in a distributed programming language Collet, Raphaël 19 December 2007 (has links) This dissertation presents a study on the extent and limits of network transparency in distributed programming languages. This property states that the result of a distributed program is the same as if it were executed on a single computer, in the case when no failure occurs. The programming language may also be network aware if it allows the programmer to control how a program is distributed and how it behaves on the network. Both aim at simplifying distributed programming, by making non-functional aspects of a program more modular. We show that network transparency is not only possible, but also practical: it can be efficient, and smoothly extended in the case of partial failure. We give a proof of concept with the programming language Oz and the system Mozart, of which we have reimplemented the distribution support on top of the Distribution Subsystem (DSS). We have extended the language to control which distribution algorithms are used in a program, and reflect partial failures in the language. Both extensions allow to handle non-functional aspects of a program without breaking the property of network transparency. Programming language Distributed programming Parallel programming Fault-tolerance
69	Logging and Recovery in a Highly Concurrent Database Keen, John S. 01 June 1994 (has links) This report addresses the problem of fault tolerance to system failures for database systems that are to run on highly concurrent computers. It assumes that, in general, an application may have a wide distribution in the lifetimes of its transactions. Logging remains the method of choice for ensuring fault tolerance. Generational garbage collection techniques manage the limited disk space reserved for log information; this technique does not require periodic checkpoints and is well suited for applications with a broad range of transaction lifetimes. An arbitrarily large collection of parallel log streams provide the necessary disk bandwidth. databases fault tolerance transaction processing sconcurrency logging and recovery
70	Reliable Interconnection Networks for Parallel Computers Dennison, Larry R. 01 October 1991 (has links) This technical report describes a new protocol, the Unique Token Protocol, for reliable message communication. This protocol eliminates the need for end-to-end acknowledgments and minimizes the communication effort when no dynamic errors occur. Various properties of end-to-end protocols are presented. The unique token protocol solves the associated problems. It eliminates source buffering by maintaining in the network at least two copies of a message. A token is used to decide if a message was delivered to the destination exactly once. This technical report also presents a possible implementation of the protocol in a worm-hole routed, 3-D mesh network. networks reliable protocols fault tolerance routors svirtual channels parallel computers

Search results