Global ETD Search

11	Performance comparison of differentNoSQL structure orientations Smailji, Liridon January 2020 (has links) This study proposes a performance comparison between the different structures of NoSQL databases; document, key-value, column and graph. A second study is also conducted, when looking at performance comparison between three different NoSQL databases, all of the same structure; document based, the databases that are tested here are; MongoDB, OrientDB and Couchbase. Performance tests are conducted using a benchmarking tool YCSB (Yahoo! Cloud Serving Benchmark), and by looking at time to execute and throughput (operations/ second). Beside benchmarking literature reviews are conducted to be able to understand the different NoSQL structures, and to elaborate our benchmarking results. Every NoSQL structure and database in our benchmark is tested in the same way, a loading phase of 1k, 10k and 100k entries, and a running phase with a workload of approximately 50% reads and 50% updates with 1k, 10k and 100k operations. The finding of this study is that there are differences in performance, both between different structures and between same structured NoSQL databases. Document based OrientDB was the highest performing database at high volumes of data, and key-value store database Redis performed best at low volumes of data. Reasons for performance differences are both linked to specific trademarks of the structural orientation, the usage of the specific attributes of CAP theorem, storage type and development language. Read more Database NoSQL performance benchmark graph key-value column and document Computer Sciences Datavetenskap (datalogi)
12	Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters Shankar, Dipti 30 September 2019 (has links) No description available. Computer Engineering Computer Science High-Performance Computing Key-Value Store RDMA Persistent Memory
13	Workload-aware Efficient Storage Systems Cheng, Yue 07 August 2017 (has links) The growing disparity in data storage and retrieval needs of modern applications is driving the proliferation of a wide variety of storage systems (e.g., key-value stores, cloud storage services, distributed filesystems, and flash cache, etc.). While extant storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, they lack the flexibility in adapting to the ever-changing workload behaviors. Moreover, the complexities in implementing modern storage systems and adapting ever-changing storage requirements present unique opportunities and engineering challenges. In this dissertation, we design and develop a series of novel data management and storage systems solutions by applying a simple yet effective rule---workload awareness. We find that simple workload-aware data management strategies are effective in improving the efficiency of modern storage systems, sometimes by an order of magnitude. The first two works tackle the data management and storage space allocation issues at distributed and cloud storage level, while the third work focuses on low-level data management problems in the local storage system, which many high-level storage/data-intensive applications rely on. In the first part of this dissertation (Chapter 3), we propose and develop MBal, a high-performance in-memory object caching framework with adaptive multi-phase load balancing, which supports not only horizontal (scale-out) but vertical (scale-up) scalability as well. MBal is able to make efficient use of available resources in the cloud through its fine-grained, partitioned, lockless design. In the second part of this dissertation (Chapter 4 and Chapter5), we design and build CAST (Chapter 4), a Cloud Analytics Storage Tiering solution that cloud tenants can use to reduce monetary cost and improve performance of analytics workloads. The approach takes the first step towards providing storage tiering support for data analytics in the cloud. Furthermore, we propose a hybrid cloud object storage system (Chapter 5) that could effectively engage both the cloud service providers and cloud tenants via a novel dynamic pricing mechanism. In the third part of this dissertation (Chapter 6), targeting local storage, we explore offline algorithms for flash caching in terms of both hit ratio and flash lifespan. We design and implement a multi-stage heuristic by synthesizing several techniques that manage data at the granularity of a flash erasure unit (which we call a container) to approximate the offline optimal algorithm. In the fourth part of this dissertation (Chapter 7), we are focused on how to enable fast prototyping of efficient distributed key-value stores targeting a proxy-based layered architecture. In this work, we design and build {con}, a framework that significantly reduce the engineering effort required to build a full-fledged distributed key-value store. Our dissertation shows that simple workload-aware data management strategies can bring huge benefit in terms of both efficiency (i.e., performance, monetary cost, etc.) and flexibility (i.e., ease-of-use, ease-of-deployment, programmability, etc.). The principles of leveraging workload dynamicity and storage heterogeneity can be used to guide next-generation storage system software design, especially when being faced with new storage hardware technologies. / Ph. D. / Modern storage systems often manage data without considering the dynamicity of user behaviors. This design approach does not consider the unique features of underlying storage medium either. To this end, this dissertation first studies how the combinational factors of random user workload dynamicity and inherent storage hardware heterogeneity impact the data management efficiency. This dissertation then presents a series of practical and efficient techniques, algorithms, and optimizations to make the storage systems workload-aware. The experimental evaluation demonstrates the effectiveness of our workload-aware design choices and strategies. Read more Storage Systems Cloud Computing Data Management Key-Value Stores Object Stores Flash SSDs Efficiency Flexibility
14	Design and Implementation of an Architecture-aware In-memory Key- Value Store Giordano, Omar January 2021 (has links) Key-Value Stores (KVSs) are a type of non-relational databases whose data is represented as a key-value pair and are often used to represent cache and session data storage. Among them, Memcached is one of the most popular ones, as it is widely used in various Internet services such as social networks and streaming platforms. Given the continuous and increasingly rapid growth of networked devices that use these services, the commodity hardware on which the databases are based must process packets faster to meet the needs of the market. However, in recent years, the performance improvements characterising the new hardware has become thinner and thinner. From here, as the purchase of new products is no longer synonymous with significant performance improvements, companies need to exploit the full potential of the hardware already in their possession, consequently postponing the purchase of more recent hardware. One of the latest ideas for increasing the performance of commodity hardware is the use of slice-aware memory management. This technique exploits the Last Level of Cache (LLC) by making sure that the individual cores take data from memory locations that are mapped to their respective cache portions (i.e., LLC slices). This thesis focuses on the realisation of a KVS prototype—based on Intel Haswell micro-architecture—built on top of the Data Plane Development Kit (DPDK), and to which the principles of slice-aware memory management are applied. To test its performance, given the non-existence of a DPDKbased traffic generator that supports the Memcached protocol, an additional prototype of a traffic generator that supports these features has also been developed. The performances were measured using two distinct machines: one for the traffic generator and one for the KVS. First, the “regular” KVS prototype was tested, then, to see the actual benefits, the slice-aware one. Both KVS prototypeswere subjected to two types of traffic: (i) uniformtraffic where the keys are always different from each other, and (ii) skewed traffic, where keys are repeated and some keys are more likely to be repeated than others. The experiments show that, in real-world scenario (i.e., characterised by skewed key distributions), the employment of a slice-aware memory management technique in a KVS can slightly improve the end-to-end latency (i.e.,~2%). Additionally, such technique highly impacts the look-up time required by the CPU to find the key and the corresponding value in the database, decreasing the mean time by ~22.5%, and improving the 99th percentile by ~62.7%. / Key-Value Stores (KVSs) är en typ av icke-relationsdatabaser vars data representeras som ett nyckel-värdepar och används ofta för att representera lagring av cache och session. Bland dem är Memcached en av de mest populära, eftersom den används ofta i olika internettjänster som sociala nätverk och strömmande plattformar. Med tanke på den kontinuerliga och allt snabbare tillväxten av nätverksenheter som använder dessa tjänster måste den råvaruhårdvara som databaserna bygger på bearbeta paket snabbare för att möta marknadens behov. Under de senaste åren har dock prestandaförbättringarna som kännetecknar den nya hårdvaran blivit tunnare och tunnare. Härifrån, eftersom inköp av nya produkter inte längre är synonymt med betydande prestandaförbättringar, måste företagen utnyttja den fulla potentialen för hårdvaran som redan finns i deras besittning, vilket skjuter upp köpet av nyare hårdvara. En av de senaste idéerna för att öka prestanda för råvaruhårdvara är användningen av skivmedveten minneshantering. Denna teknik utnyttjar den Sista Nivån av Cache (SNC) genom att se till att de enskilda kärnorna tar data från minnesplatser som är mappade till deras respektive cachepartier (dvs. SNCskivor). Denna avhandling fokuserar på förverkligandet av en KVS-prototyp— baserad på Intel Haswell mikroarkitektur—byggd ovanpå Data Plane Development Kit (DPDK), och på vilken principerna för skivmedveten minneshantering tillämpas. För att testa dess prestanda, med tanke på att det inte finns en DPDK-baserad trafikgenerator som stöder Memcachedprotokollet, har en ytterligare prototyp av en trafikgenerator som stöder dessa funktioner också utvecklats. Föreställningarna mättes med två olika maskiner: en för trafikgeneratorn och en för KVS. Först testades den “vanliga” KVSprototypen, för att se de faktiska fördelarna, den skivmedvetna. Båda KVSprototyperna utsattes för två typer av trafik: (i) enhetlig trafik där nycklarna alltid skiljer sig från varandra och (ii) sned trafik, där nycklar upprepas och vissa nycklar är mer benägna att upprepas än andra. Experimenten visar att i verkliga scenarier (dvs. kännetecknas av snedställda nyckelfördelningar) kan användningen av en skivmedveten minneshanteringsteknik i en KVS förbättra förbättringen från slut till slut (dvs. ~2%). Dessutom påverkar sådan teknik i hög grad uppslagstiden som krävs av CPU: n för att hitta nyckeln och motsvarande värde i databasen, vilket minskar medeltiden med ~22, 5% och förbättrar 99th percentilen med ~62, 7%. Read more Key-Value Store Data Plane Development Kit Last Level of Cache Sliceaware memory management Memcached Haswell microrchitecture. Key-Value Store Data Plane Development Kit Sista Nivån av Cache Skivmedveten Minneshantering Memcached Haswell mikroarkitektur. Computer and Information Sciences Data- och informationsvetenskap
15	Self-Management for Large-Scale Distributed Systems Al-Shishtawy, Ahmad January 2012 (has links) Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control. / <p>QC 20120831</p> Read more Self-Management Autonomic Computing Control Theory Distributed Systems Grid Computing Cloud Computing Elastic Services Key-Value Stores
16	Secure and high-performance big-data systems in the cloud Tang, Yuzhe 21 September 2015 (has links) Cloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay. Read more Cloud Big-data Security Efficiency Performance Streaming Multi-core Index Key-value stores Privacy preserving Performance optimization Log-structured systems
17	Storage and aggregation for fast analytics systems Amur, Hrishikesh 13 January 2014 (has links) Computing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components: 1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems. 2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation. Read more Key-value storage GroupBy-Aggregate Resource efficiency Write-optimized data structures Real-time data processing High performance computing
18	VoloDB: High Performance and ACID Compliant Distributed Key Value Store with Scalable Prune Index Scans Dar, Ali January 2015 (has links) Relational database provide an efficient mechanism to store and retrieve structured data with ACID properties but it is not ideal for every scenario. Their scalability is limited because of huge data processing requirement of modern day systems. As an alternative NoSQL is different way of looking at a database, they generally have unstructured data and relax some of the ACID properties in order to achieve massive scalability. There are many flavors of NoSQL system, one of them is a key value store. Most of the key value stores currently available in the market offers reasonable performance but compromise on many important features such as lack of transactions, strong consistency and range queries. The stores that do offer these features lack good performance. The aim of this thesis is to design and implement VoloDB, a key value store that provides high throughput in terms of both reads and writes but without compromising on ACID properties. VoloDB is built over MySQL Cluster and instead of using high-level abstractions, it communicates with the cluster using the highly efficient native low level C++ asynchronous NDB API. VoloDB talks directly to the data nodes without the need to go through MySQL Server that further enhances the performance. It exploits many of MySQL Cluster’s features such as primary and partition key lookups and prune index scans to hit only one of the data nodes to achieve maximum performance. VoloDB offers a high level abstraction that hides the complexity of the underlying system without requiring the user to think about internal details. Our key value store also offers various additional features such as multi-query transactions and bulk operation support. C++ client libraries are also provided to allow developers to interface easily with our server. Extensive evaluation is performed which benchmarks various scenarios and also compares them with another high performance open source key value store. Read more volodb high performance high throughput ACID scalable distributed key value store prune index scan Engineering and Technology Teknik och teknologier
19	Latency Tradeoffs in Distributed Storage Access Ray, Madhurima January 2019 (has links) The performance of storage systems is central to handling the huge amount of data being generated from a variety of sources including scientific experiments, social media, crowdsourcing, and from an increasing variety of cyber-physical systems. The emerging high-speed storage technologies enable the ingestion of and access to such large volumes of data efficiently. However, the combination of high data volume requirements of new applications that largely generate unstructured and semistructured streams of data combined with the emerging high-speed storage technologies pose a number of new challenges, including the low latency handling of such data and ensuring that the network providing access to the data does not become the bottleneck. The traditional relational model is not well suited for efficiently storing and retrieving unstructured and semi-structured data. An alternate mechanism, popularly known as Key-Value Store (KVS) has been investigated over the last decade to handle such data. A KVS store only needs a 'key' to uniquely identify the data record, which may be of variable length and may or may not have further structure in the form of predefined fields. Most of the KVS in existence have been designed for hard-disk based storage (before the SSDs gain popularity) where avoiding random accesses is crucial for good performance. Unfortunately, as the modern solid-state drives become the norm as the data center storage, the HDD-based KV structures result in high read, write, and space amplifications which becomes detrimental to both the SSD’s performance and endurance. Also note that regardless of how the storage systems are deployed, access to large amounts of storage by many nodes must necessarily go over the network. At the same time, the emerging storage technologies such as Flash, 3D-crosspoint, phase change memory (PCM), etc. coupled with highly efficient access protocols such as NVMe are capable of ingesting and reading data at rates that challenge even the leading edge networking technologies such as 100Gb/sec Ethernet. At the same time, some of the higher-end storage technologies (e.g., Intel Optane storage based on 3-D crosspoint technology, PCM, etc.) coupled with lean protocols like NVMe are capable of providing storage access latencies in the 10-20$\mu s$ range, which means that the additional latency due to network congestion could become significant. The purpose of this thesis is to addresses some of the aforementioned issues. We propose a new hash-based and SSD-friendly key-value store (KVS) architecture called FlashKey which is especially designed for SSDs to provide low access latencies, low read and write amplification, and the ability to easily trade-off latencies for any sequential access, for example, range queries. Through detailed experimental evaluation of FlashKey against the two most popular KVs, namely, RocksDB and LevelDB, we demonstrate that even as an initial implementation we are able to achieve substantially better write amplification, average, and tail latency at a similar or better space amplification. Next, we try to deal with network congestion by dynamically replicating the data items that are heavily used. The tradeoff here is between the latency and the replication or migration overhead. It is important to reverse the replication or migration as the congestion fades away since our observation tells that placing data and applications (that access the data) together in a consolidated fashion would significantly reduce the propagation delay and increase the network energy-saving opportunities which is required as the data center network nowadays are equipped with high-speed and power-hungry network infrastructures. Finally, we designed a tradeoff between network consolidation and congestion. Here, we have traded off the latency to save power. During the quiet hours, we consolidate the traffic is fewer links and use different sleep modes for the unused links to save powers. However, as the traffic increases, we reactively start to spread out traffic to avoid congestion due to the upcoming traffic surge. There are numerous studies in the area of network energy management that uses similar approaches, however, most of them do energy management at a coarser time granularity (e.g. 24 hours or beyond). As opposed to that, our mechanism tries to steal all the small to medium time gaps in traffic and invoke network energy management without causing a significant increase in latency. / Computer and Information Science Read more Computer Science Average and Tail Latency Data Center Management Key Value Storage Network Network Energy Management Storage Systems
20	Exploration of NoSQL technologies for managing hotel reservations Coulombel, Sylvain January 2014 (has links) During this project NoSQL technologies for Hotel IT have been evaluated. It has been determined that among NoSQL technologies, document database fits the best this use-case. Couchbase and MongoDB, the two main documents stores have been evaluated, their similarities and differences have been highlighted. This reveals that document-oriented features were more developed in MongoDB than Couchbase, this has a direct impact on search of reservations functionality. However Couchbase offers a better way to replicate data across two remote data centers. As one of the goals was to provide a powerful search functionality, it has been decided to use MongoDB as a database for this project. A proof of concept has been developed, it enables to search reservations by property code, guest name, check-in date and check-out date using a REST/JSON interface and confirms that MongoDB could work for storing hotel reservations in terms of functionality. Then different experiments have been conducted on this system such as throughput and response time using specific hotel reservation search query and data set. The results we got reached our targets. We also performed a scalability test, using MongoDB sharding functionalities to distribute data across several machines (shards) using different strategies (shard keys) so as to provide configuration recommendations. Our main finding was that it was not necessary to always distribute the database. Then if "sharding" is needed, distributing the data according to the property code will make the database go faster, because queries will be sent directly to the good machine(s) in the cluster and thus avoid "scatter-gather" query. Finally some search optimizations have been proposed, and in particular how an advanced search by names could be implemented with MongoDB. / <p>This thesis is submitted in the framework of a double degree between Compiègne University Of Technology (UTC) and Linköping University (LiU)</p> Read more NoSQL MongoDB Couchbase document key-value database ACID BASE CAP consistency high availability replication shard data center reservations system Amadeus hotel

Search results