Global ETD Search

1	Supporting fault-tolerant communication in networks Kanjani, Khushboo 15 May 2009 (has links) We address two problems dealing with fault-tolerant communication in networks. The first one is designing a distributed storage protocol tolerant to Byzantine failure of servers. The protocol implements a multi-writer multi-reader register which satisfies a weaker consistency condition called MWReg. Most of the earlier work gives multiwriter implementations by simulating m copies of a single-writer protocol where m is the number of writers. Our solution gives a direct multi-writer implementation and thus has bounded message and time complexity independent of the number of writers. We have simulated the complete protocol to test its performance and also proved its correctness theoretically. The second problem we address is of providing a reliable communication link between two nodes in a network. We present a capacity reservation algorithm in the case for upper bounds on edge capacities and costs associated with using per unit capacity on any edge. We give a flow based approximation algorithm with cost at most four times optimal. To conclude, we design a distributed storage protocol and a capacity reservation algorithm which are tolerant to network failures. Distributed Storage fault-tolerance
2	RDSS: A Reliable and Efficient Distributed Storage System Li, Xiaodong January 2004 (has links) No description available. RDSS Distributed Storage System
3	Performance Evaluation of Gluster and Compuverde Storage Systems : Comparative analysis Rajana, Poojitha January 2016 (has links) Context. Big Data and Cloud Computing nowadays require large amounts of storage that are accessible by many servers. To overcome the performance bottlenecks and single point of failure distributed storage systems came into force. So, our main aim in this thesis is evaluating the performance of these storage systems. A file coding technique is used that is the erasure coding which will help in data protection for the storage systems. Objectives. In this study, we investigate the performance evaluation of distributed storage system and understand the effect on performance for various patterns of I/O operations that is the read and write and also different measurement approaches for storage performance. Methods. The method is to use synthetic workload generator by streaming and transcoding video data as well as benchmark tool which generates the workload like SPECsfs2014 is used to evaluate the performance of distributed storage systems of GlusterFS and Compuverde which are file based storage. Results. In terms of throughput results, Gluster and Compuverde perform similar for both NFS and SMB server. The average latency results for both NFS and SMB shares indicate that Compuverde has lower latency. When comparing results of both Compuverde and Gluster, Compuverde delivers 100% IOPS with NFS server and Gluster delivers relatively near to the requested OP rate and with SMB server Gluster delivers 100% IOPS and Compuverde delivers more than the requested OP rate. Distributed storage Erasure coding Performance Streaming
4	Evaluating Energy Consumption of Distributed Storage Systems : Comparative analysis Kolli, Samuel Sushanth January 2016 (has links) Context : Big Data and Cloud Computing nowadays require large amounts of storage that are accessible by many servers. The Energy consumed by these servers as well as that consumed by hosts providing the storage has been growing rapidly over the recent years. There are various approaches to save energy both at the hardware and software level, respectively. In the context of software, this challenge requires identification of new development methodologies that can help reduce the energy footprint of the Distributed Storage System. Until recently, reducing the energy footprint of Distributed Storage Systems is a challenge because there is no new methodology implemented to reduce the energy footprint of the Distributed Storage Systems. To tackle this challenge, we evaluate the energy consumption of Distributed Storage Systems by using a Power Application Programming Interface (PowerAPI) that monitors, in real-time, the energy consumed at the granularity of a system process. Objectives : In this study we investigate the Energy Consumption of distributed storage system. We also attempt to understand the effect on energy consumption for various patters of video streams. Also we have observed different measurement approaches for energy performance. Methods : The method is to use a power measuring software library while a synthetic load generator generates the load i.e., video data streams. The Tool which generates the workload is Standard Performance Evaluation Corporation Solution File Server (SPECsfs 2014) and PowerAPI is the software power monitoring library to evaluate the energy consumption of distributed storage systems of GlusterFS and Compuverde. Results : The mean and median values of power samples in mill watts for Compuverde higher than Gluster. For Compuverde the mean and median values until the load increment of three streams was around a 400 milliwatt value. The values of mean and median for the Gluster system were gradually increasing. Conclusions : The results show Compuverde having a higher consumption of energy than Gluster as it has a higher number of running processes that implement additional features that do not exist in Gluster. Also we have concluded that the conpuverde performed better for higher values of Load i.e., video data streams. / <p>Topic : Evaluating Energy Consumption of Distributed Storage Systems</p><p>Advisor: Dr. Dragos Ilie, Senior Lecturer, BTH</p><p>External Advisor: Stefan Bernbo,CEO, Compuverde AB</p><p>Student: Samuel Sushanth Kolli</p><p>The report gives a clear description of Distributed Storage Sytems and their Energy consumption with Performance Evaluation.</p><p>The report also includes the complete description and working of SpecSFS 2014 and PowerAPI Tool.</p> / Performance Evaluation of Distributed Storage Systems Distributed storage Energy Consumption Erasure coding Streaming.
5	Distributed Storage and Processing of Image Data / Distribuerad lagring och bearbeting av bilddata Dahlberg, Tobias January 2012 (has links) Systems operating in a medical environment need to maintain high standards regarding availability and performance. Large amounts of images are stored and studied to determine what is wrong with a patient. This puts hard requirements on the storage of the images. In this thesis, ways of incorporating distributed storage into a medical system are explored. Products, inspired by the success of Google, Amazon and others, are experimented with and compared to the current storage solutions. Several “non-relational databases” (NoSQL) are investigated for storing medically relevant metadata of images, while a set of distributed file systems are considered for storing the actual images. Distributed processing of the stored data is investigated by using Hadoop MapReduce to generate a useful model of the images' metadata. Distributed Storage NoSQL Distributed Processing Hadoop MapReduce
6	Distributed large-scale data storage and processing Papailiopoulos, Dimitrios 16 March 2015 (has links) This thesis makes progress towards the fundamental understanding of heterogeneous and dynamic information systems and the way that we store and process massive data-sets. Reliable large-scale data storage: Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. However, traditional erasure codes are associated with high repair cost that is often considered an unavoidable price to pay. In this thesis, we show how to overcome these limitations. We construct novel families of erasure codes that are optimal under various repair cost metrics, while achieving the best possible reliability. We show how these modern storage codes significantly outperform traditional erasure codes. Low-rank approximations for large-scale data processing: A central goal in data analytics is extracting useful and interpretable information from massive data-sets. A challenge that arises from the distributed and large-scale nature of the data at hand, is having algorithms that are good in theory but can also scale up gracefully to large problem sizes. Using ideas from prior work, we develop a scalable lowrank optimization framework with provable guarantees for problems like the densest k-subgraph (DkS) and sparse PCA. Our experimental findings indicate that this low-rank framework can outperform the state-of-the art, by offering higher quality and more interpretable solutions, and by scaling up to problem inputs with billions of entries. / text Codes for distributed storage Big-graph analytics
7	Amélioration de la prédictibilité des performances pour les environnements de stockage de données dans les nuages / Improving Performance Predictability in Cloud Data Stores Jaiman, Vikas 30 April 2019 (has links) De nos jours, les utilisateurs de services interactifs comme le e-commerce, ou les moteurs de recherche, ont de grandes attentes sur la performance et la réactivité de ces services. En effet, les études ont montré que des lenteurs (même pendant une courte durée) impacte directement le chiffre d'affaire. Avoir des performances prédictives est donc devenu une priorité pour ces fournisseurs de services depuis une dizaine d'années.Mais empêcher la variabilité dans les systèmes de stockage distribué est un challenge car les requêtes des utilisateurs finaux transitent par des centaines de servers et les problèmes de performances engendrés par chacun de ces serveurs peuvent influencer sur la latence observée. Même dans les environnements correctement dimensionnés, des problèmes comme de la contention sur les ressources partagés ou un déséquilibre de charge entre les serveurs influent sur les latences des requêtes et en particulier sur la queue de leur distribution (95ème et 99ème centile).L’objectif de cette thèse est de développer des mécanises permettant de réduire les latences et d’obtenir des performances prédictives dans les environnements de stockage de données dans les nuages. Une contre-mesure efficace pour réduire la latence de queue dans les environnements de stockage de données dans les nuages est de fournir des algorithmes efficaces pour la sélection de réplique. Dans la sélection de réplique, une requête tentant d’accéder à une information donnée (aussi appelé valeur) identifiée par une clé unique est dirigée vers la meilleure réplique présumée. Cependant, sous des charges de travail hétérogènes, ces algorithmes entraînent des latences accrues pour les requêtes ayant un court temps d'exécution et qui sont planifiées à la suite de requêtes ayant des long temps d’exécution. Nous proposons Héron, un algorithme de sélection de répliques qui gère des charges de travail avec des requêtes ayant un temps d’exécution hétérogène. Nous évaluons Héron dans un cluster de machines en utilisant un jeu de données synthétique inspiré du jeu de données de Facebook ainsi que deux jeux de données réels provenant de Flickr et WikiMedia. Nos résultats montrent que Héron surpasse les algorithmes de l’état de l’art en réduisant jusqu’à 41% la latence médiane et la latence de queue.Dans la deuxième contribution de cette thèse, nous nous sommes concentrés sur les charges de travail multi-GET afin de réduire la latence dans les environnements de stockage de données dans les nuages Le défi consiste à estimer les opérations limitantes et à les planifier sur des serveurs non-coordonnés avec un minimum de surcoût. Pour atteindre cet objectif, nous présentons TailX, un algorithme d’ordonnancement de tâches multi-GET qui réduit les temps de latence de queue sous des charges de travail hétérogènes. Nous implémentons TailX dans Cassandra, une base de données clé-valeur largement utilisée. Il en résulte une amélioration des performances globales des environnements de stockage de données dans les nuages pour une grande variété de charges de travail hétérogènes. / Today, users of interactive services such as e-commerce, web search have increasingly high expectations on the performance and responsiveness of these services. Indeed, studies have shown that a slow service (even for short periods of time) directly impacts the revenue. Enforcing predictable performance has thus been a priority of major service providers in the last decade. But avoiding latency variability in distributed storage systems is challenging since end user requests go through hundreds of servers and performance hiccups at any of these servers may inflate the observed latency. Even in well-provisioned systems, factors such as the contention on shared resources or the unbalanced load between servers affect the latencies of requests and in particular the tail (95th and 99th percentile) of their distribution.The goal of this thesis to develop mechanisms for reducing latencies and achieve performance predictability in cloud data stores. One effective countermeasure for reducing tail latency in cloud data stores is to provide efficient replica selection algorithms. In replica selection, a request attempting to access a given piece of data (also called value) identified by a unique key is directed to the presumably best replica. However, under heterogeneous workloads, these algorithms lead to increased latencies for requests with a short execution time that get scheduled behind requests with large execution times. We propose Héron, a replica selection algorithm that supports workloads of heterogeneous request execution times. We evaluate Héron in a cluster of machines using a synthetic dataset inspired from the Facebook dataset as well as two real datasets from Flickr and WikiMedia. Our results show that Héron outperforms state-of-the-art algorithms by reducing both median and tail latency by up to 41%.In the second contribution of the thesis, we focus on multiget workloads to reduce the latency in cloud data stores. The challenge is to estimate the bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead. To reach this objective, we present TailX, a task aware multiget scheduling algorithm that reduces tail latencies under heterogeneous workloads. We implement TailX in Cassandra, a widely used key-value store. The result is an improved overall performance of the cloud data stores for a wide variety of heterogeneous workloads. Stockage distribué Performance Planification Distributed storage Performance Scheduling 004
8	Secure Store : A Secure Distributed Storage Service Lakshmanan, Subramanian 12 August 2004 (has links) As computers become pervasive in environments that include the home and community, new applications are emerging that will create and manipulate sensitive and private information. These applications span systems ranging from personal to mobile and hand held devices. They would benefit from a data storage service that protects the integrity and confidentiality of the stored data and is highly available. Such a data repository would have to meet the needs of a variety of applications, handling data with varying security and performance requirements. Providing simultaneously both high levels of security and high levels of performance may not be possible when many nodes in the system are under attack. The agility approach to building secure distributed services advocates the principle that the overhead of providing strong security guarantees should be incurred only by those applications that require such high levels of security and only at times when it is necessary to defend against high threat levels. A storage service that is designed for a variety of applications must follow the principles of agility, offering applications a range of options to choose from for their security and performance requirements. This research presents secure store, a secure and highly available distributed store to meet the performance and security needs of a variety of applications. Secure store is designed to guarantee integrity, confidentiality and availability of stored data even in the face of limited number of compromised servers. Secure store is designed based on the principles of agility. Secure store integrates two well known techniques, namely replication and secret-sharing, and exploits the tradeoffs that exist between security and performance to offer applications a range of options to choose from to suit their needs. This thesis makes several contributions, including (1) illustration of the the principles of agility, (2) a novel gossip-style secure dissemination protocol whose performance is comparable to the best-possible benign-case protocol in the absence of any malicious activity, (3) demonstration of the performance benefits of using weaker consistency models for data access, and (4) a technique called collective endorsement that can be used in other secure distributed applications. Consistency Secure dissemination Byzantine fault tolerance Storage security Distributed storage
9	A Distributed Pool Architecture for Genetic Algorithms Roy, Gautam 2009 December 1900 (has links) The genetic algorithm paradigm is a well-known heuristic for solving many problems in science and engineering in which candidate solutions, or “individuals”, are manipulated in ways analogous to biological evolution, to produce new solutions until one with the desired quality is found. As problem sizes increase, a natural question is how to exploit advances in distributed and parallel computing to speed up the execution of genetic algorithms. This thesis proposes a new distributed architecture for genetic algorithms, based on distributed storage of the individuals in a persistent pool. Processors extract individuals from the pool in order to perform the computations and then insert the resulting individuals back into the pool. Unlike previously proposed approaches, the new approach is tailored for distributed systems in which processors are loosely coupled, failure-prone and can run at different speeds. Proof-of-concept simulation results are presented for four benchmark functions and for a real-world Product Lifecycle Design problem. We have experimented with both the crash failure model and the Byzantine failure model. The results indicate that the approach can deliver improved performance due to the distribution and tolerates a large fraction of processor failures subject to both models. genetic algorithms distributed algorithms distributed storage product lifecycle design
10	On the design and optimization of heterogeneous distributed storage systems Pàmies Juárez, Lluís 19 July 2011 (has links) Durant la última dècada, la demanda d’emmagatzematge de dades ha anat creixent exponencialment any rere any. Apart de demanar més capacitat d’emmagatzematge, el usuaris actualment també demanen poder accedir a les seves dades des de qualsevol lloc i des de qualsevol dispositiu. Degut a aquests nous requeriments, els usuaris estan actualment movent les seves dades personals (correus electrònics, documents, fotografies, etc.) cap a serveis d’emmagatzematge en línia com ara Gmail, Facebook, Flickr o Dropbox. Malauradament, aquests serveis d’emmagatzematge en línia estan sostinguts per unes grans infraestructures informàtiques que poques empreses poden finançar. Per tal de reduir el costs d’aquestes grans infraestructures informàtiques, ha sorgit una nova onada de serveis d’emmagatzematge en línia que obtenen grans infraestructures d’emmagatzematge a base d’integrar els recursos petits centres de dades, o fins i tot a base d’integrar els recursos d’emmagatzematge del usuaris finals. No obstant això, els recursos que formen aquestes noves infraestructures d’emmagatzematge són molt heterogenis, cosa que planteja un repte per al dissenyadors d’aquests sistemes: Com es poden dissenyar sistemes d’emmagatzematge en línia, fiables i eficients, quan la infraestructura emprada és tan heterogènia? Aquesta tesis presenta un estudi dels principals problemes que sorgeixen quan un vol respondre a aquesta pregunta. A més proporciona diferents eines per tal d’optimitzar el disseny de sistemes d’emmagatzematge distribuïts i heterogenis. Les principals contribucions són: Primer, creem un marc d’anàlisis per estudiar els efectes de la redundància de dades en el cost dels sistemes d’emmagatzematge distribuïts. Donat un esquema de redundància específic, el marc d’anàlisis presentat permet predir el cost mitjà d’emmagatzematge i el cost mitjà de comunicació d’un sistema d’emmagatzematge implementat sobre qualsevol infraestructura informàtica distribuïda. Segon, analitzem els impactes que la redundància de dades té en la disponibilitat de les dades, i en els temps de recuperació. Donada una redundància, i donat un sistema d’emmagatzematge heterogeni, creem un grup d’algorismes per a determinar la disponibilitat de les dades esperada, i els temps de recuperació esperats. Tercer, dissenyem diferents polítiques d’assignació de dades per a diferents sistemes d’emmagatzematge. Diferenciem entre aquells escenaris on la totalitat de la infraestructura està administrada per una sola organització, i els escenaris on diferents parts auto administrades contribueixen els seus recursos. Els objectius de les nostres polítiques d’assignació de dades són: (i) minimitzar la redundància necessària, (ii) garantir la equitat entre totes les parts que participen al sistema, i (iii) incentivar a les parts perquè contribueixin els seus recursos al sistema. / Over the last decade, users’ storage demands have been growing exponentially year over year. Besides demanding more storage capacity and more data reliability, today users also demand the possibility to access their data from any location and from any device. These new needs encourage users to move their personal data (e.g., E-mails, documents, pictures, etc.) to online storage services such as Gmail, Facebook, Flickr or Dropbox. Unfortunately, these online storage services are built upon expensive large datacenters that only a few big enterprises can afford. To reduce the costs of these large datacenters, a new wave of online storage services has recently emerged integrating storage resources from different small datacenters, or even integrating user storage resources into the provider’s storage infrastructure. However, the storage resources that compose these new storage infrastructures are highly heterogeneous, which poses a challenging problem to storage systems designers: How to design reliable and efficient distributed storage systems over heterogeneous storage infrastructures? This thesis provides an analysis of the main problems that arise when one aims to answer this question. Besides that, this thesis provides different tools to optimize the design of heterogeneous distributed storage systems. The contribution of this thesis is threefold: First, we provide a novel framework to analyze the effects that data redundancy has on the storage and communication costs of distributed storage systems. Given a generic redundancy scheme, the presented framework can predict the average storage costs and the average communication costs of a storage system deployed over a specific storage infrastructure. Second, we analyze the impacts that data redundancy has on data availability and retrieval times. For a given redundancy and a heterogeneous storage infrastructure, we provide a set of algorithms that allow to determine the expected data availability and expected retrieval times. Third, we design different data assignment policies for different storage scenarios. We differentiate between scenarios where the entire storage infrastructure is managed by the same organization, and scenarios where different parties contribute their storage resources. The aims of our assignment policies are: (i) to minimize the required redundancy, (ii) to guarantee fairness among all parties, and (iii) to encourage different parties to contribute their local storage resources to the system. Distributed storage Peer-to-peer Distributed systems Heterogeneity 004

Search results