Global ETD Search

1	Serializable Isolation for Snapshot Databases Cahill, Michael James January 2009 (has links) PhD / Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases. databases transactions concurrency control snapshot isolation serialiazability
2	Multi-Master Replication for Snapshot Isolation Databases Chairunnanda, Prima January 2013 (has links) Lazy replication with snapshot isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication requires the execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily replicated system. We propose a set of techniques that support update transaction execution over multiple partitioned sites, thereby allowing the master to scale. Our techniques determine a total SI order for update transactions over multiple master sites without requiring global coordination in the distributed system, and ensure that updates are installed in this order at all sites to provide consistent and scalable replication with SI. We have built our techniques into PostgreSQL and demonstrate their effectiveness through experimental evaluation. database replication partitioning log merging snapshot isolation
3	Serializable Isolation for Snapshot Databases Cahill, Michael James January 2009 (has links) PhD / Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases. databases transactions concurrency control snapshot isolation serialiazability
4	Ensuring Serializable Executions with Snapshot Isolation DBMS Alomari, Mohammad January 2009 (has links) Doctor of Philosophy(PhD) / Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBA’s to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called ’External Lock Manager’ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBA’s to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution.
5	Ensuring Serializable Executions with Snapshot Isolation DBMS Alomari, Mohammad January 2009 (has links) Doctor of Philosophy(PhD) / Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBA’s to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called ’External Lock Manager’ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBA’s to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution.
6	Gargamel : accroître les performances des DBMS en parallélisant les transactions en écriture / Gargamel : boosting DBMS performance by parallelising write transactions Cincilla, Pierpaolo 15 September 2014 (has links) Les bases de données présentent des problèmes de passage à l’échelle. Ceci est principalement dû à la compétition pour les ressources et au coût du contrôle de la concurrence. Une alternative consiste à centraliser les écritures afin d’éviter les conflits. Cependant, cette solution ne présente des performances satisfaisantes que pour les applications effectuant majoritairement des lectures. Une autre solution est d’affaiblir les propriétés transactionnelles mais cela complexifie le travail des développeurs d’applications. Notre solution, Gargamel, répartie les transactions effectuant des écritures sur différentes répliques de la base de données tout en gardant de fortes propriétés transactionnelles. Toutes les répliques de la base de donnée s’exécutent séquentiellement, à plein débit; la synchronisation entre les répliques reste minime. Les évaluations effectuées avec notre prototype montrent que Gargamel permet d’améliorer le temps de réponse et la charge d’un ordre de grandeur quand la compétition est forte (systèmes très chargés avec ressources limitées) et que dans les autres cas le ralentissement est négligeable. / Databases often scale poorly in distributed configurations, due to the cost of concurrency control and to resource contention. The alternative of centralizing writes works well only for read-intensive workloads, whereas weakening transactional properties is problematic for application developers. Our solution spreads non-conflicting update transactions to different replicas, but still provides strong transactional guarantees. In effect, Gargamel partitions the database dynamically according to the update workload. Each database replica runs sequentially, at full bandwidth; mutual synchronisation between replicas remains minimal. Our prototype show that Gargamel improves both response time and load by an order of magnitude when contention is high (highly loaded system with bounded resources), and that otherwise slow-down is negligible. Réplication partielle véritable Snapshot isolation Generic deferred-update Replication 004.3
7	Enhancing Data Processing on Clouds with Hadoop/HBase Zhang, Chen January 2011 (has links) In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component. CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud. Cloud Hadoop HBase Snapshot Isolation Distributed Transaction Workflow Data Processing Computer Science
8	Enhancing Data Processing on Clouds with Hadoop/HBase Zhang, Chen January 2011 (has links) In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component. CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud. Cloud Hadoop HBase Snapshot Isolation Distributed Transaction Workflow Data Processing Computer Science
9	Le maintien de la cohérence dans les systèmes de stockage partiellement repliqués / Ensuring consistency in partially replicated data stores Saeida Ardekani, Masoud 16 September 2014 (has links) Dans une première partie, nous étudions la cohérence dans les systèmes transactionnels, en nous concentrant sur le problème de réconcilier la scalabilité avec des garanties transactionnelles fortes. Nous identifions quatre propriétés critiques pour la scalabilité. Nous montrons qu’aucun des critères de cohérence forte existants n’assurent l’ensemble de ces propriétés. Nous définissons un nouveau critère, appelé Non-Monotonic Snapshot Isolation ou NMSI, qui est le premier à être compatible avec les quatre propriétés à la fois. Nous présentons aussi une mise en œuvre de NMSI, appelée Jessy, que nous comparons expérimentalement à plusieurs critères connus. Une autre contribution est un canevas permettant de comparer de façon non biaisée différents protocoles. Elle se base sur la constatation qu’une large classe de protocoles transactionnels distribués est basée sur une même structure, Deferred Update Replication(DUR). Les protocoles de cette classe ne diffèrent que par les comportements spécifiques d’un petit nombre de fonctions génériques. Nous présentons donc un canevas générique pour les protocoles DUR.La seconde partie de la thèse a pour sujet la cohérence dans les systèmes de stockage non transactionnels. C’est ainsi que nous décrivons Tuba, un stockage clef-valeur qui choisit dynamiquement ses répliques selon un objectif de niveau de cohérence fixé par l’application. Ce système reconfigure automatiquement son ensemble de répliques, tout en respectant les objectifs de cohérence fixés par l’application, afin de s’adapter aux changements dans la localisation des clients ou dans le débit des requête. / In the first part, we study consistency in a transactional systems, and focus on reconciling scalability with strong transactional guarantees. We identify four scalability properties, and show that none of the strong consistency criteria ensure all four. We define a new scalable consistency criterion called Non-Monotonic Snapshot Isolation (NMSI), while is the first that is compatible with all four properties. We also present a practical implementation of NMSI, called Jessy, which we compare experimentally against a number of well-known criteria. We also introduce a framework for performing fair comparison among different transactional protocols. Our insight is that a large family of distributed transactional protocols have a common structure, called Deferred Update Replication (DUR). Protocols of the DUR family differ only in behaviors of few generic functions. We present a generic DUR framework, called G-DUR. We implement and compare several transactional protocols using the G-DUR framework.In the second part, we focus on ensuring consistency in non-transactional data stores. We introduce Tuba, a replicated key-value store that dynamically selects replicas in order to maximize the utility delivered to read operations according to a desired consistency defined by the application. In addition, unlike current systems, it automatically reconfigures its set of replicas while respecting application-defined constraints so that it adapts to changes in clients’ locations or request rates. Compared with a system that is statically configured, our evaluation shows that Tuba increases the reads that return strongly consistent data by 63%. Système transactionnel Réplication partielle véritable Cohérence forte Scalabilité (passage à l'échelle) Transactional systems Strong consistency 005.7

Search results