Spelling suggestions: "subject:"distributed storage systems"" "subject:"eistributed storage systems""
1 |
ImplementingDistributed Storage System by Network Coding in Presence of Link FailureChareonvisal, Tanakorn January 2012 (has links)
Nowadays increasing multimedia applications e.g., video and voice over IP, social networks and emails poses higher demands for sever storages and bandwidth in the networks. There is a concern that existing resource may not able to support higher demands and reliability. Network coding was introduced to improve distributed storage system. This thesis proposes the way to improve distributed storage system such as increase a chance to recover data in case there is a fail storage node or link fail in a network. In this thesis, we study the concept of network coding in distributed storage systems. We start our description from easy code which is replication coding then follow with higher complex code such as erasure coding. After that we implement these concepts in our test bed and measure performance by the probability of success in download and repair criteria. Moreover we compare success probability for reconstruction of original data between minimum storage regenerating (MSR) and minimum bandwidth regenerating (MBR) method. We also increase field size to increase probability of success. Finally, link failure was added in the test bed for measure reliability in a network. The results are analyzed and it shows that using maximum distance separable and increasing field size can improve the performance of a network. Moreover it also improves reliability of network in case there is a link failure in the repair process.
|
2 |
Exploitation du contenu pour l'optimisation du stockage distribué / Leveraging content properties to optimize distributed storage systemsKloudas, Konstantinos 06 March 2013 (has links)
Les fournisseurs de services de cloud computing, les réseaux sociaux et les entreprises de gestion des données ont assisté à une augmentation considérable du volume de données qu'ils reçoivent chaque jour. Toutes ces données créent des nouvelles opportunités pour étendre la connaissance humaine dans des domaines comme la santé, l'urbanisme et le comportement humain et permettent d'améliorer les services offerts comme la recherche, la recommandation, et bien d'autres. Ce n'est pas par accident que plusieurs universitaires mais aussi les médias publics se référent à notre époque comme l'époque “Big Data”. Mais ces énormes opportunités ne peuvent être exploitées que grâce à de meilleurs systèmes de gestion de données. D'une part, ces derniers doivent accueillir en toute sécurité ce volume énorme de données et, d'autre part, être capable de les restituer rapidement afin que les applications puissent bénéficier de leur traite- ment. Ce document se concentre sur ces deux défis relatifs aux “Big Data”. Dans notre étude, nous nous concentrons sur le stockage de sauvegarde (i) comme un moyen de protéger les données contre un certain nombre de facteurs qui peuvent les rendre indisponibles et (ii) sur le placement des données sur des systèmes de stockage répartis géographiquement, afin que les temps de latence perçue par l'utilisateur soient minimisés tout en utilisant les ressources de stockage et du réseau efficacement. Tout au long de notre étude, les données sont placées au centre de nos choix de conception dont nous essayons de tirer parti des propriétés de contenu à la fois pour le placement et le stockage efficace. / Cloud service providers, social networks and data-management companies are witnessing a tremendous increase in the amount of data they receive every day. All this data creates new opportunities to expand human knowledge in fields like healthcare and human behavior and improve offered services like search, recommendation, and many others. It is not by accident that many academics but also public media refer to our era as the “Big Data” era. But these huge opportunities come with the requirement for better data management systems that, on one hand, can safely accommodate this huge and constantly increasing volume of data and, on the other, serve them in a timely and useful manner so that applications can benefit from processing them. This document focuses on the above two challenges that come with “Big Data”. In more detail, we study (i) backup storage systems as a means to safeguard data against a number of factors that may render them unavailable and (ii) data placement strategies on geographically distributed storage systems, with the goal to reduce the user perceived latencies and the network and storage resources are efficiently utilized. Throughout our study, data are placed in the centre of our design choices as we try to leverage content properties for both placement and efficient storage.
|
3 |
Towards Malleable Distributed Storage Systems˸ From Models to Practice / Malléabilité des Systèmes de Stockage Distribués ˸ Des Modèles à la PratiqueCheriere, Nathanaël 05 November 2019 (has links)
Le Cloud, avec son modèle économique, offre la possibilité d’un gestion élastique des ressources; les utilisateurs peuvent louer des ressources selon leurs besoins. Cette élasticité permet de réduire les coûts énergétiques et financiers, et aide les applications à s’adapter aux charges de travail variables.Les applications manipulant de grandes quantités de données exécutées dans le Cloud ou sur des supercalculateurs sont souvent colocalisées avec un système de stockage distribué pour garantir un accès rapide aux données. Bien que de nombreux travaux aient été proposés pour redimensionner dynamiquement les capacités de calcul pour s’ajuster à la charge de travail, le stockage n’est pas considéré comme malléable (capable d’être redimensionné dynamiquement) puisque les transferts de grandes quantités de données nécessaires sont considérés trop lents. Cependant, le matériel et les techniques de stockage ont évolué et cette hypothèse doit être réévaluée.Dans cette thèse, nous présentons une étude sous différents angles des opérations de redimensionnement des systèmes de stockage distribués.Nous commençons par modéliser la durée minimale de ces opérations pour évaluer leur vitesse potentielle. Puis, nous développons un benchmark conçu pour mesurer la viabilité de la malléabilité d’un système de stockage sur une plateforme donnée. Finalement, nous implémentons un gestionnaire d’opérations de redimensionnement pour systèmes de stockage distribués qui décide et organise les transferts de données requis par ces opérations. / The Cloud, with its pay-as-you-go model, gives the possibility of elastic resource management; users can claim and release resources as needed. This elasticity leads to financial and energetical cost reductions, and helps applications to cope with varying workloads.Distributed cloud and HPC applications processing large amounts of data are often co-located with a distributed storage system in order to ensure fast data accesses. Although many works have been proposed to dynamically rescale the processing part of such systems to match their workload, the storage is never considered as malleable (able to be dynamically rescaled) since moving massive amounts of data around is assumed to be too slow in practice. However, in recent years hardware and storage techniques have evolved and this assumption needs to be revisited.In this thesis, we present a study of the rescaling operations in distributed storage systems approached from different angles. We start by modeling the minimal duration of rescaling operations to estimate their potential speed. Then, we develop a benchmark to measure the viability of distributed storage system malleability on a given platform. Last, we implement a rescaling manager for distributed storage systems that decides and organizes the data transfers required during a rescaling operation.
|
4 |
Towards a Flexible High-efficiency Storage System for Containerized ApplicationsZhao, Nannan 08 October 2020 (has links)
Due to their tight isolation, low overhead, and efficient packaging of the execution environment, Docker containers have become a prominent solution for deploying modern applications. Consequently, a large amount of Docker images are created and this massive image dataset presents challenges to the registry and container storage infrastructure and so far has remained a largely unexplored area. Hence, there is a need of docker image characterization that can help optimize and improve the storage systems for containerized applications. Moreover, existing deduplication techniques significantly degrade the performance of registries, which will slow down the container startup time. Therefore, there is growing demand for high storage efficiency and high-performance registry storage systems. Last but not least, different storage systems can be integrated with containers as backend storage systems and provide persistent storage for containerized applications. So, it is important to analyze the performance of different backend storage systems and storage drivers and draw out the implications for container storage system design. These above observations and challenges motivate my dissertation.
In this dissertation, we aim to improve the flexibility, performance, and efficiency of the storage systems for containerized applications. To this end, we focus on the following three important aspects: Docker images, Docker registry storage system, and Docker container storage drivers with their backend storage systems. Specifically, this dissertation adopts three steps: (1) analyzing the Docker image dataset; (2) deriving the design implications; (3) designing a new storage framework for Docker registries and propose different optimizations for container storage systems.
In the first part of this dissertation (Chapter 3), we analyze over 167TB of uncompressed Docker Hub images, characterize them using multiple metrics and evaluate the potential of le level deduplication in Docker Hub. In the second part of this dissertation (Chapter 4), we conduct a comprehensive performance analysis of container storage systems based on the key insights from our image characterizations, and derive several design implications. In the third part of this dissertation (Chapter 5), we propose DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. DupHunter supports several configurable deduplication modes, which provide different levels of storage efficiency, durability, and performance, to support a range of uses. In the fourth part of this dissertation (Chapter 6), we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage systems. This high-performance fash cluster can be used for registries to speedup performance. / Doctor of Philosophy / The amount of Docker images stored in Docker registries is increasing rapidly and present challenges for the underlying storage infrastructures. Before we do any optimizations for the storage system, we should first analyze this big Docker image dataset. To this end, in this dissertation we perform the first large-scale characterization and redundancy analysis of the images and layers stored in the Docker Hub registry. Based on the findings, this dissertation presents a series of practical and efficient techniques, algorithms, optimizations to achieve high performance and flexibility, and space-efficient storage system for containerized applications. The experimental evaluation demonstrates the effectiveness of our optimizations and techniques to make storage systems flexible and space-efficacy.
|
5 |
Information-Theoretically Secure Communication Under Channel UncertaintyLy, Hung Dinh 2012 May 1900 (has links)
Secure communication under channel uncertainty is an important and challenging problem in physical-layer security and cryptography. In this dissertation, we take a
fundamental information-theoretic view at three concrete settings and use them to shed insight into efficient secure communication techniques for different scenarios under channel uncertainty.
First, a multi-input multi-output (MIMO) Gaussian broadcast channel with two receivers and two messages: a common message intended for both receivers (i.e., channel
uncertainty for decoding the common message at the receivers) and a confidential message intended for one of the receivers but needing to be kept asymptotically perfectly secret from the other is considered. A matrix characterization of the secrecy capacity region is established via a channel-enhancement argument and an extremal entropy inequality previously established for characterizing the capacity region of a degraded compound MIMO Gaussian broadcast channel.
Second, a multilevel security wiretap channel where there is one possible realization for the legitimate receiver channel but multiple possible realizations for the eavesdropper channel (i.e., channel uncertainty at the eavesdropper) is considered. A coding scheme is designed such that the number of secure bits delivered to the legitimate receiver depends on the actual realization of the eavesdropper channel. More specifically, when the eavesdropper channel realization is weak, all bits delivered to the legitimate receiver need to be secure. In addition, when the eavesdropper channel realization is strong, a prescribed part of the bits needs to remain secure. We call such codes security embedding codes, referring to the fact that high-security bits are now embedded into the low-security ones. We show that the key to achieving efficient security embedding is to jointly encode the low-security and high-security bits. In particular, the low-security bits can be used as (part of) the transmitter randomness to protect the high-security ones.
Finally, motivated by the recent interest in building secure, robust and efficient distributed information storage systems, the problem of secure symmetrical multilevel diversity coding (S-SMDC) is considered. This is a setting where there are channel uncertainties at both the legitimate receiver and the eavesdropper. The problem of encoding individual sources is first studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that the simple coding strategy of separately encoding individual sources at the encoders can achieve the minimum sum rate for the general S-SMDC problem.
|
6 |
Autonomic management in a distributed storage systemTauber, Markus January 2010 (has links)
This thesis investigates the application of autonomic management to a distributed storage system. Effects on performance and resource consumption were measured in experiments, which were carried out in a local area test-bed. The experiments were conducted with components of one specific distributed storage system, but seek to be applicable to a wide range of such systems, in particular those exposed to varying conditions. The perceived characteristics of distributed storage systems depend on their configuration parameters and on various dynamic conditions. For a given set of conditions, one specific configuration may be better than another with respect to measures such as resource consumption and performance. Here, configuration parameter values were set dynamically and the results compared with a static configuration. It was hypothesised that under non-changing conditions this would allow the system to converge on a configuration that was more suitable than any that could be set a priori. Furthermore, the system could react to a change in conditions by adopting a more appropriate configuration. Autonomic management was applied to the peer-to-peer (P2P) and data retrieval components of ASA, a distributed storage system. The effects were measured experimentally for various workload and churn patterns. The management policies and mechanisms were implemented using a generic autonomic management framework developed during this work. The motivation for both groups of experiments was to test management policies with the objective to avoid unsatisfactory situations with respect to resource consumption and performance. Such unsatisfactory situations occur when either the P2P layer or the data retrieval mechanism is configured statically. In a statically configured P2P system two unsatisfactory situations can be identified. The first arises when the frequency with which P2P node states are verified is low and membership churn is high. The P2P node state becomes inaccurate due to a high membership churn, leading to errors during the routing process and a reduction in performance. In this situation it is desirable to increase the frequency to increase P2P state accuracy. The converse situation arises when the frequency is high and churn is low. In this situation network resources are used unnecessarily, which may also reduce performance, making it desirable to decrease the frequency. In ASA’s data retrieval mechanism similar unsatisfactory situations can be identified with respect to the degree of concurrency (DOC). The DOC controls the eagerness with which multiple redundant replicas are retrieved. An unsatisfactory situation arises when the DOC is low and there is a large variation in the times taken to retrieve replicas. In this situation it is desirable to increase the DOC, because by retrieving more replicas in parallel a result can be returned to the user sooner. The converse situation arises when the DOC is high, there is little variation in retrieval time and there is a network bottleneck close to the requesting client. In this situation it is desirable to decrease the DOC, since the low variation removes any benefit in parallel retrieval, and the bottleneck means that decreasing parallelism reduces both bandwidth consumption and elapsed time for the user. The experimental evaluations of autonomic management show promising results, and suggest several future research topics. These include optimisations of the managed mechanisms, alternative management policies, different evaluation methods, and the application of developed management mechanisms to other facets of a distributed storage system. The findings of this thesis could be exploited in building other distributed storage systems that focus on harnessing storage on user workstations, since these are particularly likely to be exposed to varying, unpredictable conditions.
|
7 |
Etude des codes en graphes pour le stockage de données / Study of Sparse-Graph for Distributed Storage SystemsJule, Alan 07 March 2014 (has links)
Depuis deux décennies, la révolution technologique est avant tout numérique entrainant une forte croissance de la quantité de données à stocker. Le rythme de cette croissance est trop importante pour les solutions de stockage matérielles, provoquant une augmentation du coût de l'octet. Il est donc nécessaire d'apporter une amélioration des solutions de stockage ce qui passera par une augmentation de la taille des réseaux et par la diminution des copies de sauvegarde dans les centres de stockage de données. L'objet de cette thèse est d'étudier l'utilisation des codes en graphe dans les réseaux de stockage de donnée. Nous proposons un nouvel algorithme combinant construction de codes en graphe et allocation des noeuds de ce code sur le réseau. Cet algorithme permet d'atteindre les hautes performances des codes MDS en termes de rapport entre le nombre de disques de parité et le nombre de défaillances simultanées pouvant être corrigées sans pertes (noté R). Il bénéficie également des propriétés de faible complexité des codes en graphe pour l'encodage et la reconstruction des données. De plus, nous présentons une étude des codes LDPC Spatiallement-Couplés permettant d'anticiper le comportement de leur décodage pour les applications de stockage de données.Il est généralement nécessaire de faire des compromis entre différents paramètres lors du choix du code correcteur d'effacement. Afin que ce choix se fasse avec un maximum de connaissances, nous avons réalisé deux études théoriques comparatives pour compléter l'état de l'art. La première étude s'intéresse à la complexité de la mise à jour des données dans un réseau dynamique établi et déterminons si les codes linéaires utilisés ont une complexité de mise à jour optimale. Dans notre seconde étude, nous nous sommes intéressés à l'impact sur la charge du réseau de la modification des paramètres du code correcteur utilisé. Cette opération peut être réalisée lors d'un changement du statut du fichier (passage d'un caractère hot à cold par exemple) ou lors de la modification de la taille du réseau. L'ensemble de ces études, associé au nouvel algorithme de construction et d'allocation des codes en graphe, pourrait mener à la construction de réseaux de stockage dynamiques, flexibles avec des algorithmes d'encodage et de décodage peu complexes. / For two decades, the numerical revolution has been amplified. The spread of digital solutions associated with the improvement of the quality of these products tends to create a growth of the amount of data stored. The cost per Byte reveals that the evolution of hardware storage solutions cannot follow this expansion. Therefore, data storage solutions need deep improvement. This is feasible by increasing the storage network size and by reducing data duplication in the data center. In this thesis, we introduce a new algorithm that combines sparse graph code construction and node allocation. This algorithm may achieve the highest performance of MDS codes in terms of the ratio R between the number of parity disks and the number of failures that can be simultaneously reconstructed. In addition, encoding and decoding with sparse graph codes helps lower the complexity. By this algorithm, we allow to generalize coding in the data center, in order to reduce the amount of copies of original data. We also study Spatially-Coupled LDPC (SC-LDPC) codes which are known to have optimal asymptotic performance over the binary erasure channel, to anticipate the behavior of these codes decoding for distributed storage applications. It is usually necessary to compromise between different parameters for a distributed storage system. To complete the state of the art, we include two theoretical studies. The first study deals with the computation complexity of data update and we determine whether linear code used for data storage are update efficient or not. In the second study, we examine the impact on the network load when the code parameters are changed. This can be done when the file status changes (from a hot status to a cold status for example) or when the size of the network is modified by adding disks. All these studies, combined with the new algorithm for sparse graph codes, could lead to the construction of new flexible and dynamical networks with low encoding and decoding complexities.
|
Page generated in 0.0734 seconds