Global ETD Search

21	Separating data from metadata for robustness and scalability Wang, Yang, active 21st century 09 February 2015 (has links) When building storage systems that aim to simultaneously provide robustness, scalability, and efficiency, one faces a fundamental tension, as higher robustness typically incurs higher costs and thus hurts both efficiency and scalability. My research shows that an approach to storage system design based on a simple principle—separating data from metadata—can yield systems that address elegantly and effectively that tension in a variety of settings. One observation motivates our approach: much of the cost paid by many strong protection techniques is incurred to detect errors. This observation suggests an opportunity: if we can build a low-cost oracle to detect errors and identify correct data, it may be possible to reduce the cost of protection without weakening its guarantees. This dissertation shows that metadata, if carefully designed, can serve as such an oracle and help a storage system protect its data with minimal cost. This dissertation shows how to effectively apply this idea in three very different systems: Gnothi—a storage replication protocol that combines the high availability of asynchronous replication and the low cost of synchronous replication for a small-scale block storage; Salus—a large-scale block storage with unprecedented guarantees in terms of consistency, availability, and durability in the face of a wide range of server failures; and Exalt—a tool to emulate a large storage system with 100 times fewer machines. / text Metadata Robustness Scalability Storage Replication Fault tolerace
22	Entity resolution for large relational datasets Guo, Zhaochen Unknown Date No description available. Entity resolution Scalability Relational entity resolution
23	Scalability of RAID systems Li, Yan January 2010 (has links) RAID systems (Redundant Arrays of Inexpensive Disks) have dominated backend storage systems for more than two decades and have grown continuously in size and complexity. Currently they face unprecedented challenges from data intensive applications such as image processing, transaction processing and data warehousing. As the size of RAID systems increases, designers are faced with both performance and reliability challenges. These challenges include limited back-end network bandwidth, physical interconnect failures, correlated disk failures and long disk reconstruction time. This thesis studies the scalability of RAID systems in terms of both performance and reliability through simulation, using a discrete event driven simulator for RAID systems (SIMRAID) developed as part of this project. SIMRAID incorporates two benchmark workload generators, based on the SPC-1 and Iometer benchmark specifications. Each component of SIMRAID is highly parameterised, enabling it to explore a large design space. To improve the simulation speed, SIMRAID develops a set of abstraction techniques to extract the behaviour of the interconnection protocol without losing accuracy. Finally, to meet the technology trend toward heterogeneous storage architectures, SIMRAID develops a framework that allows easy modelling of different types of device and interconnection technique.Simulation experiments were first carried out on performance aspects of scalability. They were designed to answer two questions: (1) given a number of disks, which factors affect back-end network bandwidth requirements; (2) given an interconnection network, how many disks can be connected to the system. The results show that the bandwidth requirement per disk is primarily determined by workload features and stripe unit size (a smaller stripe unit size has better scalability than a larger one), with cache size and RAID algorithm having very little effect on this value. The maximum number of disks is limited, as would be expected, by the back-end network bandwidth. Studies of reliability have led to three proposals to improve the reliability and scalability of RAID systems. Firstly, a novel data layout called PCDSDF is proposed. PCDSDF combines the advantages of orthogonal data layouts and parity declustering data layouts, so that it can not only survivemultiple disk failures caused by physical interconnect failures or correlated disk failures, but also has a good degraded and rebuild performance. The generating process of PCDSDF is deterministic and time-efficient. The number of stripes per rotation (namely the number of stripes to achieve rebuild workload balance) is small. Analysis shows that the PCDSDF data layout can significantly improve the system reliability. Simulations performed on SIMRAID confirm the good performance of PCDSDF, which is comparable to other parity declustering data layouts, such as RELPR. Secondly, a system architecture and rebuilding mechanism have been designed, aimed at fast disk reconstruction. This architecture is based on parity declustering data layouts and a disk-oriented reconstruction algorithm. It uses stripe groups instead of stripes as the basic distribution unit so that it can make use of the sequential nature of the rebuilding workload. The design space of system factors such as parity declustering ratio, chunk size, private buffer size of surviving disks and free buffer size are explored to provide guidelines for storage system design. Thirdly, an efficient distributed hot spare allocation and assignment algorithm for general parity declustering data layouts has been developed. This algorithm avoids conflict problems in the process of assigning distributed spare space for the units on the failed disk. Simulation results show that it effectively solves the write bottleneck problem and, at the same time, there is only a small increase in the average response time to user requests. 004.56
24	Distributed databases for Multi Mediation : Scalability, Availability & Performance Kuruganti, NSR Sankaran January 2015 (has links) Context: Multi Mediation is a process of collecting data from network(s) & network elements, pre-processing this data and distributing it to various systems like Big Data analysis, Billing Systems, Network Monitoring Systems, and Service Assurance etc. With the growing demand for networks and emergence of new services, data collected from networks is growing. There is need for efficiently organizing this data and this can be done using databases. Although RDBMS offers Scale-up solutions to handle voluminous data and concurrent requests, this approach is expensive. So, alternatives like distributed databases are an attractive solution. Suitable distributed database for Multi Mediation, needs to be investigated. Objectives: In this research we analyze two distributed databases in terms of performance, scalability and availability. The inter-relations between performance, scalability and availability of distributed databases are also analyzed. The distributed databases that are analyzed are MySQL Cluster 7.4.4 and Apache Cassandra 2.0.13. Performance, scalability and availability are quantified, measurements are made in the context of Multi Mediation system. Methods: The methods to carry out this research are both qualitative and quantitative. Qualitative study is made for the selection of databases for evaluation. A benchmarking harness application is designed to quantitatively evaluate the performance of distributed database in the context of Multi Mediation. Several experiments are designed and performed using the benchmarking harness on the database cluster. Results: Results collected include average response time & average throughput of the distributed databases in various scenarios. The average throughput & average INSERT response time results favor Apache Cassandra low availability configuration. MySQL Cluster average SELECT response time is better than Apache Cassandra for greater number of client threads, in high availability and low availability configurations.Conclusions: Although Apache Cassandra outperforms MySQL Cluster, the support for transaction and ACID compliance are not to be forgotten for the selection of database. Apart from the contextual benchmarks, organizational choices, development costs, resource utilizations etc. are more influential parameters for selection of database within an organization. There is still a need for further evaluation of distributed databases. / <p>I am indebted to my advisor Prof. Lars Lundberg and his valuable ideas which helped in the completion of this work. In fact he has guided on every crucial and important stages of this research work.</p><p>I sincerely thank Prof. Markus Fiedler & Prof. Kurt Tutschku for their endless support during the work.</p><p>I am grateful to Neeraj Garg, Sourab, Saket & Kulbir at Ericsson, for providing me necessary equipment and helping me financially during my work.</p><p>To my family members and friends who one way or the other shared their support. Thank you.</p><p>Above all I would like to thank the Supreme Personality of Godhead, the author of everything.</p> Availability Benchmarking Distributed databases Performance Scalability.
25	Dynamic Scale-out Mechanisms for Partitioned Shared-Nothing Databases Karyakin, Alexey January 2011 (has links) For a database system used in pay-per-use cloud environments, elastic scaling becomes an essential feature, allowing for minimizing costs while accommodating fluctuations of load. One approach to scalability involves horizontal database partitioning and dynamic migration of partitions between servers. We define a scale-out operation as a combination of provisioning a new server followed by migration of one or more partitions to the newly-allocated server. In this thesis we study the efficiency of different implementations of the scale-out operation in the context of online transaction processing (OLTP) workloads. We designed and implemented three migration mechanisms featuring different strategies for data transfer. The first one is based on a modification of the Xen hypervisor, Snowflock, and uses on-demand block transfers for both server provisioning and partition migration. The second one is implemented in a database management system (DBMS) and uses bulk transfers for partition migration, optimized for higher bandwidth utilization. The third one is a conventional application, using SQL commands to copy partitions between servers. We perform an experimental comparison of those scale-out mechanisms for disk-bound and CPU-bound configurations. When comparing the mechanisms we analyze their impact on whole-system performance and on the experience of individual clients. DBMS Scalability Cloud Computing Computer Science
26	Resource Discovery and Fair Intelligent Admission Control over Scalable Internet January 2004 (has links) The Internet currently supports a best-effort connectivity service. There has been an increasing demand for the Internet to support Quality of Service (QoS) to satisfy stringent service requirements from many emerging networking applications and yet to utilize the network resources efficiently. However, it has been found that even with augmented QoS architecture, the Internet cannot achieve the desired QoS and furthermore, there are concerns about the scalability of any available QoS solutions. If the network is not provisioned adequately, the Internet is not capable to handle congestion condition. This is because the Internet is unaware of its internal network QoS states therefore it is not possible to provide QoS when the network state changes dynamically. This thesis addresses the following question: Is it possible to deliver the applications with QoS in the Internet fairly and efficiently while keeping scalability? In this dissertation we answer this question affirmatively by proposing an innovative service architecture: the Resource Discovery (RD) and Fair Intelligent Admission Control (FIAC) over scalable Internet. The main contributions of this dissertation are as follows: 1. To detect the network QoS state, we propose the Resource Discovery (RD) framework to provide network QoS state dynamically. The Resource Discovery (RD) adopts feedback loop mechanism to collect the network QoS state and reports to the Fair Intelligent Admission Control module, so that FIAC is capable to take resource control efficiently and fairly. 2. To facilitate network resource management and flow admission control, two scalable Fair Intelligent Admission Control architectures are designed and analyzed on two levels: per-class level and per-flow level. Per-class FIAC handles the aggregate admission control for certain pre-defined aggregate. Per-flow FIAC handles the flow admission control in terms of fairness within the class. 3. To further improve its scalability, the Edge-Aware Resource Discovery and Fair Intelligent Admission Control is proposed which does not need the core routers involvement. We devise and analyze implementation of the proposed solutions and demonstrate the effectiveness of the approach. For the Resource Discovery, two closed-loop feedback solutions are designed and investigated. The first one is a core-aware solution which is based on the direct QoS state information. To further improve its scalability, the edge-aware solution is designed where only the edges (not core)are involved in the feedback QoS state estimation. For admission control, FIAC module bridges the gap between 'external' traffic requirements and the 'internal' network ability. By utilizing the QoS state information from RD, FIAC intelligently allocate resources via per-class admission control and per-flow fairness control. We study the performance and robustness of RD-FIAC through extensive simulations. Our results show that RD can obtain the internal network QoS state and FIAC can adjust resource allocation efficiently and fairly. Internet Information retrieval Computer networks Scalability
27	Enhancing OpenStack clouds using P2P technologies Joseph, Robin January 2017 (has links) It was known for a long time that OpenStack has issues with scalability. Peer-to-Peer systems, on the other hand, have proven to scale well without significant reduction of performance. The objectives of this thesis are to study the challenges associated with P2P-enhanced clouds and present solutions for overcoming them. As a case study, we take the architecture of the P2P-enhanced OpenStack implemented at Ericsson that uses the CYCLON P2Pprotocol. We study the OpenStack architecture and P2P technologies and finally propose solutions and provide possibilities in addressing the challenges that are faced by P2P-enhanced OpenStack clouds. We emphasize mainly on a decentralized identity service and management of Virtual machine images. This work also investigates the characterization of P2P architectures for their use in P2P-enhanced OpenStack clouds. The results section shows that the proposed solution enables the existing P2P system to scale beyond what was originally possible. We also show that the P2P-enhanced system performs better than the standard OpenStack. / <p>Ericsson Cloud Research supported this work through the guidance of Dr. Fetahi Wuhib, Dr. Joao Monteiro Soares and Vinay Yadav, Experienced Researchers, Ericsson Cloud Research, Kista, Stockholm.</p> OpenStack P2P Scalability Cloud computing Telecommunications Telekommunikation
28	Scaling Geospatial Searches in Large Spatial Databases Cary, Ariel 08 November 2011 (has links) Modern geographical databases store a rich set of aspatial attributes in addition to geographic data. Retrieving spatial records constrained on spatial and aspatial attributes provides users the ability to perform more interesting spatial analyses via composite spatial searches; e.g., in a real estate database, "Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000". Efficient processing of such composite searches requires combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by non-spatial filter, or viceversa), which can incur large performance overheads. On the other hand, the amount of geolocation data in databases is rapidly increasing due in part to advances in geolocation technologies (e.g., GPS- enabled mobile devices) that allow to associate location data to nearly every object or event. Hence, practical spatial databases may face data ingestion challenges of large data volumes. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre- processing task) can be scaled in MapReduce – a well-adopted parallel programming model, developed by Google, for data intensive problems. Close to linear scalability was observed in index construction tasks over large spatial datasets. Subsequently, we develop novel techniques for simultaneously indexing spatial with textual and numeric data to process k-nearest neighbor searches with aspatial Boolean selection constraints. In particular, numeric ranges are compactly encoded and explicitly indexed. Experimental evaluations with real spatial databases showed query response times within acceptable ranges for interactive search systems. Spatial databases searches MapReduce scalability indexing.
29	Scaling blockchain for the energy sector Dahlquist, Olivia, Hagström, Louise January 2017 (has links) p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica} Blockchain is a distributed ledger technology enabling digital transactions without the need for central governance. Once transactions are added to the blockchain, they cannot be altered. One of the main challenges of blockchain implementation is how to create a scalable network meaning verifying many transactions per second. The goal of this thesis is to survey different approaches for scaling blockchain technologies. Scalability is one of the main drivers in blockchain development, and an important factor when understanding the future progress of blockchain. The energy sector is in need of further digitalisation and blockchain is therefore of interest to enhance the digital development of smart grids and Internet of Things. The focus of this work is put on a case study in the energy sector regarding a payment system for electrified roads. To research those questions a qualitative method based on interviews with blockchain experts and actors in electrified roads projects was applied. The interviews were processed and summarised, and thereafter related to map current developments and needs in the blockchain technology. This thesis points to the importance of considering the trilemma, stating that blockchain can be two of three things; scalable, decentralised, secure. Further, Greenspan’s criteria are applied in order to recognise the value of blockchain. These criteria together with the trilemma and understanding blockchain’s placement in the hype cycle, are of value when implementing blockchain. The study shows that blockchain technology is at an early stage and questions remain regarding future business use. Scalability solutions are both technical and case specific and it is found that future solutions for scaling blockchain are emerging. Blockchain Scalability Engineering and Technology Teknik och teknologier
30	A scalability evaluation on CockroachDB Lifhjelm, Tobias January 2021 (has links) Databases are a cornerstone in data storage since they store and organize large amounts of data while allowing users to access specific parts of data easily. Databases must however adapt to an increasing amount of users without negatively affect the end-users. CochroachDB (CRDB) is a distributed SQLdatabase that combines consistency related to relational database management systems, with scalability to handle more user requests simultaneously while still being consistent. This paper presents a study that evaluates the scalability properties of CRDB by measuring how latency is affected by the addition of more nodes to a CRDB cluster. The findings show that the latency can decrease with the addition of nodes to a cluster. However, there are cases when more nodes increase the latency. cockroachdb scalability Computer Sciences Datavetenskap (datalogi)

Search results