Global ETD Search

1	Retina: Cross-Layered Key-Value Store using Computational Storage Bikonda, Naga Sanjana 10 March 2022 (has links) Modern SSDs are getting faster and smarter with near-data computing capabilities. Due to their design choices, traditional key-value stores do not fully leverage these new storage devices. These key-value stores become CPU-bound even before fully utilizing the IO bandwidth. LSM or B+ tree-based key-value stores involve complex garbage collection and store sorted keys and complicated synchronization mechanisms. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. We employ many design choices not explored in other persistent key-value stores to achieve this goal. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache, support for variable key-value pairs, and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage, improve overall bandwidth and reduce the bandwidth net- work latencies. Thus when evaluated using YCSB, we observe the CPU utilization reduced by 4x and throughput performance improvement of 20.5% against the state-of-the-art for read-intensive workloads. / Master of Science / Modern secondary storage systems are providing an exponential increase in memory access speeds. In addition, new generation storage systems attach compute resources near data to offload computation to storage. Traditional datastore systems are lacking in performance when used with the new generation SSDs (Solid State Drive). The key reason is the SSDs are underutilized due to CPU bottlenecks. Due to design choices, conventional datastores incur expensive CPU tasks that cause the CPU to bottleneck even before the storage speeds are fully utilized. Thus, when attached to a modern SSD, conventional datastores will underutilize the storage resources. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage and improve overall access speed. To evaluate Retina, we use throughput and CPU utilization as the comparison metric. We test our implementation with Yahoo Cloud Serving Benchmark, a popular datastore benchmark. We evaluate against RocksDB(the most widely adopted datastore) to enable fair performance comparison. In conclusion, we show that Retina key-value store improves the throughput performance by offloading logic to computational storage to reduce the CPU bottlenecks. Computational storage Key-Value store Crash consistency
2	Towards Efficient and Flexible Object Storage Using Resource and Functional Partitioning Anwar, Ali 08 June 2018 (has links) Modern storage systems are designed to manage data without considering the dynamicity of user or resource requirements. This design approach does not consider the complexities of the dynamically changing runtime application behaviors as well as the unique features of underlying resources. To this end, this dissertation studies how resource and functional partitioning strategies can improve efficiency and flexibility of object stores. This dissertation presents a series of practical and efficient techniques, algorithms, and optimizations to realize efficient and flexible object stores. The experimental evaluation demonstrates the effectiveness of our design choices and strategies to make object stores flexible and resource-aware. / Ph. D. / Modern storage systems are designed to manage data without considering the dynamicity of user or resource requirements. This design approach does not consider the complexities of the dynamically changing runtime application behaviors as well as the unique features of underlying resources. To this end, this dissertation studies how resource and functional partitioning strategies can improve efficiency and flexibility of object stores. This dissertation presents a series of practical and efficient techniques, algorithms, and optimizations to realize efficient and flexible object stores. The experimental evaluation demonstrates the effectiveness of our design choices and strategies to make object stores flexible and resource-aware. Object Stores Key-Value Stores Flexibility Efficiency
3	Creating a NoSQL database for the Internet of Things : Creating a key-value store on the SensibleThings platform Zhu, Sainan January 2015 (has links) Due to the requirements of the Web 2.0 applications and the relational databaseshave a limitation in horizontal scalability. NoSQL databases have become moreand more popular in recent years. However, it is not easy to select a databasethat is suitable for a specific use. This thesis describes the detailed design, im plementation and final performance evaluation of a key-value NoSQL databasefor the SensibleThings platform, which is an Internet of Things platform. Thethesis starts by comparing the different types of NoSQL databases to select themost appropriate one. During the implementation of the database, the algorithms for data partition, data access, replication, addition and removal ofnodes, failure detection and handling are dealt with. The final results for theload distribution and the performance evaluation are also presented in this pa per. At the end of the thesis, some problems and improvements that need betaken into consideration in the futures. NoSQL databases key-value Internet of Things SensibleThings platform
4	Accelerated Storage Systems Khasymski, Aleksandr Sergeev 11 March 2015 (has links) Today's large-scale, high-performance, data-intensive applications put a tremendous stress on data centers to store, index, and retrieve large amounts of data. Exemplified by technologies such as social media, photo and video sharing, and e-commerce, the rise of the real-time web demands data stores support minimal latencies, always-on availability and ever-growing capacity. These requirements have fostered the development of a large number of high-performance storage systems, arguably the most important of which are Key-Value (KV) stores. An emerging trend for achieving low latency and high throughput in this space is a solution, which utilizes both DRAM and flash by storing an efficient index for the data in memory and minimizing accesses to flash, where both keys and values are stored. Many proposals have examined how to improve KV store performance in this area. However, these systems have shortcomings, including expensive sorting and excessive read and write amplification, which is detrimental to the life of the flash. Another trend in recent years equips large scale deployments with energy-efficient, high performance co-processors, such as Graphics Processing Units (GPUs). Recent work has explored using GPUs to accelerate compute-intensive I/O workloads, including RAID parity generation, encryption, and compression. While this research has proven the viability of GPUs to accelerate these workloads, we argue that there are significant benefits to be had by developing methods and data structures for deep integration of GPUs inside the storage stack, in order to achieve better performance, scalability, and reliability. In this dissertation, we propose comprehensive frameworks that leverage emerging technologies, such as GPUs and flash-based SSDs, to accelerate modern storage systems. For our accelerator-based solution, we focus on developing a system that features deep integration of the GPU in a distributed parallel file system. We utilize a framework that builds on the resources available in the file system and coordinates the workload in such a way that minimizes data movement across the PCIe bus, while exposing data parallelism to maximize the potential for acceleration on the GPU. Our research aims to improve the overall reliability of a PFS by developing a distributed per-file parity generation that provides end-to-end data integrity and unprecedented flexibility. Finally, we design a high-performance KV store utilizing a novel data structure tailored to specific flash requirements; it arranges data on flash in such a way as to minimize write amplification, which is detrimental to the flash cells. The system delivers outstanding read amplification through the use of a trie index and false positive filter. / Ph. D. Key-Value Store Flash Software RAID Accelerator-based Computing
5	Caching of key-value stores in the data plane / Caching av nyckel-värde-databaser i dataplanet Larsson, Samuel January 2019 (has links) The performance of distributed key-value stores is usually dependent on its underlying network, and have potential to improve read/write latencies by improving upon the per- formance of the network communication. We explore the potential performance increase by designing an experimental in-network cache based on NetCache in the switch data plane for the distributed key-value store DXRAM, and placing it on a programmable switch that connects the peers in a DXRAM storage cluster. To accomodate DXRAM which uses TCP for its transport protocol, we also design a TCP flow state translator for the cache and implement an experimental version of this cache design. Benchmark runs with the cache show that best-case item read latency for DXRAM is reduced to approximately half and prove the potential performance gain that can be expected once a proper cache is designed and implemented. key-value store programmable data planes cache NetCache DXRAM Computer Sciences Datavetenskap (datalogi)
6	NoSQL: a análise da modelagem e consistência dos dados na era do Big Data Rodrigues, Wagner Braz 19 October 2017 (has links) Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-11-14T11:11:11Z No. of bitstreams: 1 Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5) / Made available in DSpace on 2017-11-14T11:11:12Z (GMT). No. of bitstreams: 1 Wagner Braz Rodrigues.pdf: 1280673 bytes, checksum: 018f4fcf8df340ef7175b709b9d870b7 (MD5) Previous issue date: 2017-10-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The new storage models, known as NoSQL, arise to solve current data issues, defined by the properties volume, velocity and variety (3 V’s) established in the Big Data concept. These new storage models develop with the support of distributed computing and horizontal scalability, which allows the processing of the big amount of data necessary to the Big Data 3 V’s. In this thesis was used as theoretical framework the relational model, introducing its solutions and troubles. The relational model allowed the use of structures in secondary memory in a persistent way. Its modeling establishes rules to the creation of a solid data model, using mathematics concepts and tangible representation to the human interpretation. The properties defined by the transactional model ACID, implemented in the relational SGBDs brings assurance consistency of the storaged data. The use of the relational model distanced the transient structures in primary memory, used in execution time by software applications and those persisted in secondary memory, an effect known as impedance mismatch. The new models presented by the categories of the NoSQL, bring transient structures previously used in primary memory. The use of distributed computing presents the possibility of the transaction and storage of the data for several computers, known as nodes, present in clusters. Distributed computing increases availability and decreases the likelihood of system failures. However, its use brings inconsistency to the data, according to the properties defined by the CAP Theorem (FOX; BREWER, 1999). This study was carried out on behalf of a bibliographic review, analyzing primarily the needs, which led to the relational model creation. Later, we establish the state of the theoretical and techniques art that involves the NoSQL and the distributed data processing system, just as the different categories introduced by it. An adequate tool were chosen and analyzed from each NoSQL category, for the proper understanding about your structure, metadata and operations. Aside from establish the state of art regarding NoSQL, we demonstrate how the transient and persistent data structures rapprochement becomes possible due to the current machine advances, such as the possibilities to the consistency effect processing, outlined by CAP Theorem / Os novos modelos de armazenamento de dados, conhecidos como NoSQL (Not Only SQL), surgem para solucionar as problemáticas de dados atuais, definidas pelas propriedades volume, velocidade e variedade (3 V’s) presentes no conceito do Big Data. Esses novos modelos de armazenamento se desenvolvem com o suporte da computação distribuída e “escalabilidade horizontal”, o que possibilita o tratamento do grande volume de dados necessários para os V’s do Big Data. Nesta dissertação é utilizado como referencial teórico o modelo relacional, apresentando suas soluções e problemas. O modelo relacional possibilitou a persistência de estruturas de dados, em memória secundária não volátil. Sua modelagem estabelece regras para a criação de um modelo de dados fundamentado, utilizando conceitos de lógica formal e representação compreensível à interpretação humana. As propriedades definidas pelo modelo transacional ACID (Atomicity, Consistency, Isolation, Durability), utilizado em SGBDs (Sistema Gerenciador de Bando de Dados) relacionais, garantem que os dados transacionados serão “persistidos” de maneira consistente na base de dados. O emprego do modelo relacional distanciou as estruturas transientes em memória primária, utilizadas em tempo de execução por aplicações de software e as persistidas em memória secundária, efeito conhecido como “incompatibilidade de impedância”. Os novos modelos apresentados pelas categorias apresentadas no NoSQL trazem estruturas transientes anteriormente utilizadas em memória primária. Contudo, abrem mão da forte estruturação, apresentada pelo modelo relacional. A utilização da computação distribuída apresenta a possibilidade da realização de transações e armazenamento dos dados para vários computadores, conhecidos como nós, presentes em cluster. Esse conceito conhecido como tolerância a partição, aumenta a disponibilidade e diminui a possibilidade de falhas em um sistema. No entanto, sua utilização, traz inconsistência aos dados, conforme as propriedades definidas pelo Teorema CAP (FOX; BREWER, 1999). Este trabalho foi realizado através de revisão bibliográfica, analisando primeiramente as necessidades que levaram à criação do modelo relacional. Posteriormente, estabelecemos o estado da arte das teorias e técnicas que envolvem o NoSQL e o tratamento de dados em sistemas distribuídos, bem como as diferentes categorias apresentadas por ele. Foram escolhidas e analisadas uma ferramenta pertencente a cada categoria de NoSQL para o entendimento de duas estruturas, metamodelos e operações. Além de estabelecer o estado da arte referente ao NoSQL, demonstramos como a reaproximação das estruturas transientes e persistentes se torna possível dado os avanços de máquina atuais, que possibilitaram avanços computacionais, assim como as possibilidades para o tratamento dos efeitos na consistência, demonstrados pelo Teorema CAP NoSQL Armazenamento de dados Chave-Valor Data warehousing Key-value CNPQ::ENGENHARIAS
7	Improving Table Scans for Trie Indexed Databases Toney, Ethan 01 January 2018 (has links) We consider a class of problems characterized by the need for a string based identifier that reflect the ontology of the application domain. We present rules for string-based identifier schemas that facilitate fast filtering in databases used for this class of problems. We provide runtime analysis of our schema and experimentally compare it with another solution. We also discuss performance in our solution to a game engine. The string-based identifier schema can be used in addition scenarios such as cloud computing. An identifier schema adds metadata about an element. So the solution hinges on additional memory but as long as queries operate only on the included metadata there is no need to load the element from disk which leads to huge performance gains. key-value identifier schema pattern schema compressed prefix tree game engine resource manager Computational Engineering
8	Predicting Service Metrics from Device and Network Statistics Forte, Paolo January 2015 (has links) For an IT company that provides a service over the Internet like Facebook or Spotify, it is very important to provide a high quality of service; however, predicting the quality of service is generally a hard task. The goal of this thesis is to investigate whether an approach that makes use of statistical learning to predict the quality of service can obtain accurate predictions for a Voldemort key-value store [1] in presence of dynamic load patterns and network statistics. The approach follows the idea that the service-level metrics associated with the quality of service can be estimated from serverside statistical observations, like device and network statistics. The advantage of the approach analysed in this thesis is that it can virtually work with any kind of service, since it is based only on device and network statistics, which are unaware of the type of service provided. The approach is structured as follows. During the service operations, a large amount of device statistics from the Linux kernel of the operating system (e.g. cpu usage level, disk activity, interrupts rate) and some basic end-to-end network statistics (e.g. average round-trip-time, packet loss rate) are periodically collected on the service platform. At the same time, some service-level metrics (e.g. average reading time, average writing time, etc.) are collected on the client machine as indicators of the store’s quality of service. To emulate network statistics, such as dynamic delay and packet loss, all the traffic is redirected to flow through a network emulator. Then, different types of statistical learning methods, based on linear and tree-based regression algorithms, are applied to the data collections to obtain a learning model able to accurately predict the service-level metrics from the device and network statistics. The results, obtained for different traffic scenarios and configurations, show that the thesis’ approach can find learning models that can accurately predict the service-level metrics for a single-node store with error rates lower than 20% (NMAE), even in presence of network impairments. Quality of service machine learning network statistics key-value store Voldemort Communication Systems Kommunikationssystem Computer Systems Datorsystem
9	Automated Control of Elasticity for a Cloud-Based Key-Value Store Arman, Ala January 2012 (has links) “Pay-as-you-go” is one of the basic properties of Cloud computing. It means that people pay for the resources or services that they use. Moreover, the concept of load balancing has been a controversial issue in recent years. It is a method that is used to split a task to some smaller tasks and allocate them fairly to different resources resulting in a better performance. Considering these two concepts, the idea of “Elasticity” comes to attention. An Elastic system is one which adds or releases the resources based on the changes of the system variables. In this thesis, we extended a distributed storage called Voldemort by adding a controller to provide elasticity. Control theory was used to design this controller. In addition, we used Yahoo! Cloud Service Benchmark (YCSB) which is an open source framework that can be used to provide several load scenarios, as well as evaluating the controller. Automatic control is accomplished by adding or removing nodes in Voldemort by considering changes in the system such as the average service time in our case. We will show that when the service time increases due to increasing the load, as generated by YCSB tool, the controller senses this change and adds appropriate number of nodes to the storage. The number of nodes added is based on the controller parameters to decrease the service time and meet Service Level Objectives (SLO). Similarly, when the average service time decreases, the controller removes some nodes to reduce the cost of using the resources and meet “pay-as-you-go” property. Cloud Computing Elastic Computing Control Theory Voldemort YCSB Key-Value Store Engineering and Technology Teknik och teknologier
10	NoSQL-databaser i socialt nätverk Persson, Ragnvald January 2018 (has links) Syftet med studien är att göra en fördjupning inom NoSQL-databaser och undersöka vilka uppgifter som de olika NoSQL-grupperna passar bäst till i ett socialt nätverk, som t.ex. Facebook och Twitter. Det finns fyra olika typer av NoSQL-databaser: kolumndatabaser, grafdatabaser, nyckelvärdedatabaser och dokumentdatabaser. Frågan är vilken NoSQL-databas ska man välja till en viss uppgift i ett givet socialt nätverk. När man ska utveckla ett socialt nätverk, som kräver lagring av data, är det viktigt att känna till vilken typ av databas som bör användas till en vis typ av uppgift. För att få svar på frågorna har det gjorts en undersökning över vad tidigare forskning har kommit fram till. Det har även gjorts en praktisk studie med alla fyra NoSQL-grupper i ett experiment med lagring av användaruppgifter, meddelanden och vänner. / The purpose of the study is to deepen within NoSQL databases and investigate what tasks the different NoSQL groups fit best in a social network, such as Facebook and Twitter. The data is, for example, about the storage of personal data or social networking. There are four different types of NoSQL databases: column databases, graph databases, key value databases and document databases. The question is which NoSQL database should be chosen for a particular task in a given social network. When developing a social network that requires data storage, it is important to know what kind of database should be used for a certain type of task.In order to answer the questions, an investigation has been made of what previous research has reached. There has also been a practical study of all four NoSQL groups in an experiment with storing user information, messages and friends. NoSQL Social network Document Column Key-value Graph Engineering and Technology Teknik och teknologier

Search results