Spelling suggestions: "subject:"distributed atorage atemsystem"" "subject:"distributed atorage systsystem""
1 |
RDSS: A Reliable and Efficient Distributed Storage SystemLi, Xiaodong January 2004 (has links)
No description available.
|
2 |
On Codes for Private Information Retrieval and Ceph Implementation of a High-Rate Regenerating CodeVinayak, R January 2017 (has links) (PDF)
Error-control codes, which are being extensively used in communication systems, have found themselves very useful in data storage as well during the past decade. This thesis deals with two types of codes for data storage, one pertaining to the issue of privacy and the other to reliability.
In many scenarios, user accessing some critical data from a server would not want the server to learn the identity of data retrieved. This problem, called Private Information Retrieval (PIR) was rst formally introduced by Chor et al and they gave protocols for PIR in the case where multiple copies of the same data is stored in non-communicating servers. The PIR protocols that came up later also followed this replication model. The problem with data replication is the high storage overhead involved, which will lead to large storage costs. Later, Fazeli, Vardy and Yaakobi, came up with the notion of PIR code that enables information-theoretic PIR with low storage overhead. In the rst part of this thesis, construction of PIR codes for certain parameter values is presented. These constructions are based on a variant of conventional Reed-Muller (RM) codes called binary Projective Reed-Muller (PRM) codes. A lower bound on block length of systematic PIR codes is derived and the PRM based PIR codes are shown to be optimal with respect to this bound in some special cases. The codes constructed here have smaller block lengths than the short block length PIR codes known in the literature. The generalized Hamming weights of binary PRM codes are also studied.
Another work described here is the implementation and evaluation of an erasure code called Coupled Layer (CL) code in Ceph distributed storage system. Erasure codes are used in distributed storage to ensure reliability. An additional desirable feature required for codes used in this setting is the ability to handle node repair efficiently. The Minimum Storage Regenerating (MSR) version of CL code downloads optimal amount of data from other nodes during repair of a failed node and even disk reads during this process is optimum, for that storage overhead. The CL-Near-MSR code, which is a variant of CL-MSR, can efficiently handle a restricted set of multiple node failures also. Four example CL codes were evaluated using a 26 node Amazon cluster and performance metrics like network bandwidth, disk read and repair time were measured. Repair time reduction of the order of 3 was observed for one of those codes, in comparison with Reed Solomon code having same parameters. To the best of our knowledge, such large gains in repair performance have never been demonstrated before.
|
3 |
Efficient Usage Of Flash Memories In High Performance ScenariosSrimugunthan, * 10 1900 (has links) (PDF)
New PCI-e flash cards and SSDs supporting over 100,000 IOPs are now available, with several usecases in the design of a high performance storage system. By using an array of flash chips, arranged in multiple banks, large capacities are achieved. Such multi-banked architecture allow parallel read, write and erase operations. In a raw PCI-e flash card, such parallelism is directly available to the software layer. In addition, the devices have restrictions such as, pages within a block can only be written sequentially. The devices also have larger minimum write sizes (>4KB). Current flash translation layers (FTLs) in Linux are not well suited for such devices due to the high device speeds, architectural restrictions as well as other factors such as high lock contention. We present a FTL for Linux that takes into account the hardware restrictions, that also exploits the parallelism to achieve high speeds. We also consider leveraging the parallelism for garbage collection by scheduling the garbage collection activities on idle banks. We propose and evaluate an adaptive method to vary the amount of garbage collection according to the current I/O load on the device.
For large scale distributed storage systems, flash memories are an excellent choice because flash memories consume less power, take lesser floor space for a target throughput and provide faster access to data. In a traditional distributed filesystem, even distribution is required to ensure load-balancing, balanced space utilisation and failure tolerance. In the presence of flash memories, in addition, we should also ensure that the numbers of writes to these different flash storage nodes are evenly distributed, to ensure even wear of flash storage nodes, so that unpredictable failures of storage nodes are avoided. This requires that we distribute updates and do garbage collection, across the flash storage nodes. We have motivated the distributed wearlevelling problem considering the replica placement algorithm for HDFS. Viewing the wearlevelling across flash storage nodes as a distributed co-ordination problem, we present an alternate design, to reduce the message communication cost across participating nodes. We demonstrate the effectiveness of our design through simulation.
|
4 |
Étude des problèmes d’ordonnancement sur des plates-formes hétérogènes en modèle multi-portRejeb, Hejer 30 August 2011 (has links)
Les travaux menés dans cette thèse concernent les problèmes d'ordonnancement sur des plates-formes de calcul dynamiques et hétérogènes et s'appuient sur le modèle de communication "multi-port" pour les communications. Nous avons considéré le problème de l'ordonnancement des tâches indépendantes sur des plates-formes maîtres-esclaves, dans les contextes statique et dynamique. Nous nous sommes également intéressé au problème de la redistribution de fichiers répliqués dans le cadre de l'équilibrage de charge. Enfin, nous avons étudié l'importance des mécanismes de partage de bande passante pour obtenir une meilleure efficacité du système. / The results presented in this document deal with scheduling problems on dynamic and heterogeneous computing platforms under the "multiport" model for the communications. We have considered the problem of scheduling independent tasks on master-slave platforms, in both offline and online contexts. We have also proposed algorithms for replicated files redistribution to achieve load balancing. Finally, we have studied the importance of bandwidth sharing mechanisms to achieve better efficiency.
|
Page generated in 0.1034 seconds