Global ETD Search

1	Fault-Tolerant Average Execution Time Optimization for General-Purpose Multi-Processor System-On-Chips Väyrynen, Mikael January 2009 (has links) <p>Fault tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault tolerance. For a given job and a soft (transient) no-error probability, we define mathematical formulas for AET using voting (active replication), rollback-recovery with checkpointing (RRC) and a combination of these (CRV) where bus communication overhead is included. And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize the AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC or a combination where RRC is included, (2) finding the number of processors and job-to-processor assignment when using voting or a combination where voting is used, and (3) defining fault tolerance scheme (voting, RRC or CRV) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.</p> Fault tolerance Execution time optimization Rollback recovery with checkpointing Active replication MPSoC Computer science Datavetenskap
2	Fault-Tolerant Average Execution Time Optimization for General-Purpose Multi-Processor System-On-Chips Väyrynen, Mikael January 2009 (has links) Fault tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault tolerance. For a given job and a soft (transient) no-error probability, we define mathematical formulas for AET using voting (active replication), rollback-recovery with checkpointing (RRC) and a combination of these (CRV) where bus communication overhead is included. And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize the AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC or a combination where RRC is included, (2) finding the number of processors and job-to-processor assignment when using voting or a combination where voting is used, and (3) defining fault tolerance scheme (voting, RRC or CRV) per job and defining its usage for each job. Experiments demonstrate significant savings in AET. Fault tolerance Execution time optimization Rollback recovery with checkpointing Active replication MPSoC Computer Sciences Datavetenskap (datalogi)
3	On the Fault-tolerance and High Performance of Replicated Transactional Systems Hirve, Sachin 28 September 2015 (has links) With the recent technological developments in last few decades, there is a notable shift in the way business/consumer transactions are conducted. These transactions are usually triggered over the internet and transactional systems working in the background ensure that these transactions are processed. The majority of these transactions nowadays fall in Online Transaction Processing (OLTP) category, where low latency is preferred characteristic. In addition to low latency, OLTP transaction systems also require high service continuity and dependability. Replication is a common technique that makes the services dependable and therefore helps in providing reliability, availability and fault-tolerance. Deferred Update Replication (DUR) and Deferred Execution Replication (DER) represent the two well known transaction execution models for replicated transactional systems. Under DUR, a transaction is executed locally at one node before a global certification is invoked to resolve conflicts against other transactions running on remote nodes. On the other hand, DER postpones the transaction execution until the agreement on a common order of transaction requests is reached. Both DUR and DER require a distributed ordering layer, which ensures a total order of transactions even in case of faults. In today's distributed transactional systems, performance is of paramount importance. Any loss in performance, e.g., increased latency due to slow processing of client requests, may entail loss of revenue for businesses. On one hand, the DUR model is a good candidate for transaction processing in those systems in case the conflicts among transactions are rare, while it can be detrimental for high conflict workload profiles. On the other hand, the DER model is an attractive choice because of its ability to behave as independent of the characteristics of the workload, but trivial realizations of the model ultimately do not offer a good performance increase margin. Indeed transactions are executed sequentially and the total order layer can be a serious bottleneck for latency and scalability. This dissertation proposes novel solutions and system optimizations to enhance the overall performance of replicated transactional systems. The first presented result is HiperTM, a DER-based transaction replication solution that is able to alleviate the costs of the total order layer via speculative execution techniques. HiperTM exploits the time that is between the broadcast of a client request and the finalization of the order for that request to speculatively execute the request, so to achieve an overlapping between replicas coordination and transactions execution. HiperTM proposes two main components: OS-Paxos, a novel total order layer that is able to early deliver requests optimistically according to a tentative order, which is then either confirmed or rejected by a final total order; SCC, a lightweight speculative concurrency control protocol that is able to exploit the optimistic delivery of OS-Paxos and execute transactions in a speculative fashion. SCC still processes write transactions serially in order to minimize the code instrumentation overheads, but it is able to parallelize the execution of read-only transactions thanks to its built-in object multiversion scheme. The second contribution in this dissertation is X-DUR, a novel transaction replication system that addressed the high cost of local and remote aborts in case of high contention on shared objects in DUR based approaches, due to which the performance is adversely affected. Exploiting the knowledge of client's transaction locality, X-DUR incorporates the benefits of state machine approach to scale-up the distributed performance of DUR systems. As third contribution, this dissertation proposes Archie, a DER-based replicated transactional system that improves HiperTM in two aspects. First, Archie includes a highly optimized total order layer that combines optimistic-delivery and batching thus allowing the anticipation of a big amount of work before the total order is finalized. Then the concurrency control is able to process transactions speculatively and with a higher degree of parallelism, although the order of the speculative commits still follows the order defined by the optimistic delivery. Both HiperTM and Archie perform well up to a certain number of nodes in the system, beyond which their performance is impacted by limitations of single leader-based total-order layer. This motivates the design of Caesar, the forth contribution of this dissertation, which is a transactional system based on a novel multi-leader partial order protocol. Caesar enforces a partial order on the execution of transactions according to their conflicts, by letting non-conflicting transactions to proceed in parallel and without enforcing any synchronization during the execution (e.g., no locks). As the last contribution, this dissertation presents Dexter, a replication framework that exploits the commonly observed phenomenon such that not all read-only workloads require up-to-date data. It harnesses the application specific freshness and content-based constraints of read-only transactions to achieve high scalability. Dexter services the read-only requests according to the freshness guarantees specified by the application and routes the read-only workload accordingly in the system to achieve high performance and low latency. As a result, Dexter framework also alleviates the interference between read-only requests and read-write requests thereby helping to improve the performance of read-write requests execution as well. / Ph. D. Distributed Transaction Memory Fault-tolerance Active Replication Distributed Systems On-line Transaction Processing
4	Uma Solução de Reconfiguração Leve para Paxos / A Lightware reconfiguration Solution for Paxos Paula, Anderson Parra de 29 June 2015 (has links) Made available in DSpace on 2016-06-02T19:07:10Z (GMT). No. of bitstreams: 1 PAULA_Anderson_2015.pdf: 815177 bytes, checksum: b64e699dd3ec918452fa1075460274f9 (MD5) Previous issue date: 2015-06-29 / Paxos is an active replication algorithm that keeps the same shared state consistently among servers that handle requests from an application. It is unusual to find applications where the main processing happens through a replication algorithm such as Paxos, mostly due to the high number of exchanged messages required to keep the state consistent. This restricts the system scalability to a handful of replicas. To increase the applicability of active replication, we would like be able to not only make the capacity of processing proportional to the number of servers employed, but also change dynamically the number of server according to demand. In this dissertation we explored reconfiguration on systems that use active replication. We proposed two mechanisms: (1) efficient protocolo for state transfer; and (2) incorporation of new replicas in the system with no significant increase in the cost to keep the whole system consistent. Our approach uses both mechanisms to create reader replicas, capable of answering all application requests without taking an active part in the costly operations of the Paxos algorithm. / Paxos é um mecanismo de replicação ativa que consegue manter um mesmo estado compartilhado entre servidores que atendem a requisições de uma aplicação. É incomum encontrar aplicações onde a parte principal do processamento acontece através de um algoritmo de replicação como Paxos devido ao seu custo em termos do número de mensagens trocadas, o que limita a escalabilidade do sistema para algumas poucas réplicas. Para aumentar a aplicabilidade de replicação ativa, gostaríamos de ser ser capazes de, não só tornar a capacidade de processamento proporcional ao número de servidores empregados, mas também de variar essa capacidade dinamicamente em resposta às mudanças da demanda gerada. Nessa dissertação exploramos a questão da reconfiguração em sistemas de replicação ativa. Em particular, cobiçamos transformar a biblioteca de replicação Treplica em um sistema reconfigurável. Propomos dois novos mecanismos: (1) protocolo eficiente para transferência de estado; e (2) adição de novas réplicas sem aumentar de forma significativa o custo de manutenção da consistência do sistema como um todo. Nossa estratégia utiliza os dois mecanismos para criação de réplicas leitoras, que são capazes de atender todas as requisições da aplicação sem no entanto participarem ativamente das operações custosas do algoritmo Paxos. replicação ativa Paxos reconfiguração transferência de estado algoritmo rede de computador - protocolo Active replication Paxos Reconfiguration State transfer

1

Page generated in 0.0888 seconds