Global ETD Search

1	On Codes for Private Information Retrieval and Ceph Implementation of a High-Rate Regenerating Code Vinayak, R January 2017 (has links) (PDF) Error-control codes, which are being extensively used in communication systems, have found themselves very useful in data storage as well during the past decade. This thesis deals with two types of codes for data storage, one pertaining to the issue of privacy and the other to reliability. In many scenarios, user accessing some critical data from a server would not want the server to learn the identity of data retrieved. This problem, called Private Information Retrieval (PIR) was rst formally introduced by Chor et al and they gave protocols for PIR in the case where multiple copies of the same data is stored in non-communicating servers. The PIR protocols that came up later also followed this replication model. The problem with data replication is the high storage overhead involved, which will lead to large storage costs. Later, Fazeli, Vardy and Yaakobi, came up with the notion of PIR code that enables information-theoretic PIR with low storage overhead. In the rst part of this thesis, construction of PIR codes for certain parameter values is presented. These constructions are based on a variant of conventional Reed-Muller (RM) codes called binary Projective Reed-Muller (PRM) codes. A lower bound on block length of systematic PIR codes is derived and the PRM based PIR codes are shown to be optimal with respect to this bound in some special cases. The codes constructed here have smaller block lengths than the short block length PIR codes known in the literature. The generalized Hamming weights of binary PRM codes are also studied. Another work described here is the implementation and evaluation of an erasure code called Coupled Layer (CL) code in Ceph distributed storage system. Erasure codes are used in distributed storage to ensure reliability. An additional desirable feature required for codes used in this setting is the ability to handle node repair efficiently. The Minimum Storage Regenerating (MSR) version of CL code downloads optimal amount of data from other nodes during repair of a failed node and even disk reads during this process is optimum, for that storage overhead. The CL-Near-MSR code, which is a variant of CL-MSR, can efficiently handle a restricted set of multiple node failures also. Four example CL codes were evaluated using a 26 node Amazon cluster and performance metrics like network bandwidth, disk read and repair time were measured. Repair time reduction of the order of 3 was observed for one of those codes, in comparison with Reed Solomon code having same parameters. To the best of our knowledge, such large gains in repair performance have never been demonstrated before. Error Control Codes Regenerating Code Ceph Implementation Private Information Retrieval Distributed Storage Reliability PIR Codes PRM Codes Coupled Layer Code Ceph Distributed Storage System Minimum Storage Regenerating (MSR) Codes CL-MSR Code CL-NMSR Code Electrical Communication Engineering
2	Codes With Locality For Distributed Data Storage Moorthy, Prakash Narayana 03 1900 (has links) (PDF) This thesis deals with the problem of code design in the setting of distributed storage systems consisting of multiple storage nodes, storing many different data les. A primary goal in such systems is the efficient repair of a failed node. Regenerating codes and codes with locality are two classes of coding schemes that have recently been proposed in literature to address this goal. While regenerating codes aim to minimize the amount of data-download needed to carry out node repair, codes with locality seek to minimize the number of nodes accessed during node repair. Our focus here is on linear codes with locality, which is a concept originally introduced by Gopalan et al. in the context of recovering from a single node failure. A code-symbol of a linear code C is said to have locality r, if it can be recovered via a linear combination of r other code-symbols of C. The code C is said to have (i) information-symbol locality r, if all of its message symbols have locality r, and (ii) all-symbol locality r, if all the code-symbols have locality r. We make the following three contributions to the area of codes with locality. Firstly, we extend the notion of locality, in two directions, so as to permit local recovery even in the presence of multiple node failures. In the first direction, we consider codes with \local error correction" in which a code-symbol is protected by a local-error-correcting code having local-minimum-distance 3, and thus allowing local recovery of the code-symbol even in the presence of 2 other code-symbol erasures. In the second direction, we study codes with all-symbol locality that can recover from two erasures via a sequence of two local, parity-check computations. When restricted to the case of all-symbol locality and two erasures, the second approach allows, in general, for design of codes having larger minimum distance than what is possible via the rst approach. Under both approaches, by studying the generalized Hamming weights of the dual codes, we derive tight upper bounds on their respective minimum distances. Optimal code constructions are identified under both approaches, for a class of code parameters. A few interesting corollaries result from this part of our work. Firstly, we obtain a new upper bound on the minimum distance of concatenated codes and secondly, we show how it is always possible to construct the best-possible code (having largest minimum distance) of a given dimension when the code's parity check matrix is partially specified. In a third corollary, we obtain a new upper bound for the minimum distance of codes with all-symbol locality in the single erasure case. Secondly, we introduce the notion of codes with local regeneration that seek to combine the advantages of both codes with locality as well as regenerating codes. These are vector-alphabet analogues of codes with local error correction in which the local codes themselves are regenerating codes. An upper bound on the minimum distance is derived when the constituent local codes have a certain uniform rank accumulation (URA) property. This property is possessed by both the minimum storage regenerating (MSR) and the minimum bandwidth regenerating (MBR) codes. We provide several optimal constructions of codes with local regeneration, where the local codes are either the MSR or the MBR codes. The discussion here is also extended to the case of general vector-linear codes with locality, in which the local codes do not necessarily have the URA property. Finally, we evaluate the efficacy of two specific coding solutions, both possessing an inherent double replication of data, in a practical distributed storage setting known as Hadoop. Hadoop is an open-source platform dealing with distributed storage of data in which the primary aim is to perform distributed computation on the stored data via a paradigm known as Map Reduce. Our evaluation shows that while these codes have efficient repair properties, their vector-alphabet-nature can negatively a affect Map Reduce performance, if they are implemented under the current Hadoop architecture. Specifically, we see that under the current architecture, the choice of number processor cores per node and Map-task scheduling algorithm play a major role in determining their performance. The performance evaluation is carried out via a combination of simulations and actual experiments in Hadoop clusters. As a remedy to the problem, we also pro-pose a modified architecture in which one allows erasure coding across blocks belonging to different les. Under the modified architecture, the new coding solutions will not suffer from any Map Reduce performance-loss as seen in the original architecture, while retaining all of their desired repair properties Distributed Storage Coding Regeneration Codes Local Repair Codes Linear Codes Information Theory Error Correcting Codes Encoding Vector Codes Minimum Storage Regenerating (MSR) Codes Coding Theory Hadoop Distributed Data Storage Computer Science
3	Coding Schemes For Distributed Subspace Computation, Distributed Storage And Local Correctability Vadlamani, Lalitha 02 1900 (has links) (PDF) In this thesis, three problems have been considered and new coding schemes have been devised for each of them. The first is related to distributed function computation, the second to coding for distributed storage and the final problem is based on locally correctable codes. A common theme of the first two problems considered is distributed computation. The first problem is motivated by the problem of distributed function computation considered by Korner and Marton, where the goal is to compute XOR of two binary sources at the receiver. It has been shown that linear encoders give better sum rates for some source distributions as compared to the usual Slepian-Wolf scheme. We generalize this distributed function computation setting to the case of more than two sources and the receiver is interested in computing multiple linear combinations of the sources. Consider `m' random variables each of which takes values from a finite field and are associated with a certain joint probability distribution. The receiver is interested in the lossless computation of `s' linear combinations of the m random variables. By considering the set of all linear combinations of m random variables as a vector space V , this problem can be interpreted as a subspace-computation problem. For this problem, we develop three increasingly refined approaches, all based on linear encoders. The first two approaches which are termed as common code approach and selected subspace approach, use a common matrix to encode all the sources. In the common code approach, the desired subspace W is computed at the receiver, whereas in the selected subspace approach, possibly a larger subspace U which contains the desired subspace is computed. The larger subspace U which gives the minimum sum rate itself is based on a decomposition of vector space V into a chain of subspaces. The chain of subspaces is determined by the joint probability distribution of m random variables and a notion of normalized measure of entropy. The third approach is a nested code approach, where all the encoding matrices are nested and the same subspace U which is identified in the selected subspace approach is computed. We characterize the sum rates under all the three approaches. The sum rate under nested code approach is no larger than both selected subspace approach and Slepian-Wolf approach. For a large class of joint distributions and subspaces W , the nested code scheme is shown to improve upon Slepian-Wolf scheme. Additionally, a class of source distributions and subspaces are identified, for which the nested code approach is sum-rate optimal. In the second problem, we consider a distributed storage network, where data is stored across nodes in a network which are failure-prone. The goal is to store data reliably and efficiently. For a required level of reliability, it is of interest to minimise storage overhead and also of interest to perform node repair efficiently. Conventionally replication and maximum distance separable (MDS) codes are employed in such systems. Though replication is very efficient in terms of node repair, the storage overhead is high. MDS codes have low storage overhead but even the repair of a single failed node requires contacting a large number of nodes and downloading all their data. We consider two coding solutions that have recently been proposed, which enable efficient node repair in case of single node failure. The first solution called regenerating codes seeks to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. We extend these results in two directions. In the first one, we introduce the notion of codes with locality where the local codes have minimum distance more than 2 and hence can recover a code symbol locally even in the presence of multiple erasures. These codes are termed as codes with local erasure correction. We say that a code has information locality if there exists a set of message symbols, each of which is covered by local codes. A code is said to have all-symbol locality if all the code symbols are covered by local codes. An upper bound on the minimum distance of codes with information locality is presented and codes that are optimal with respect to this bound are constructed. We make a connection between codes with local erasure correction and concatenated codes. The second direction seeks to build codes that combine the advantages of both codes with locality as well as regenerating codes. These codes, termed here as codes with local regeneration, are codes with locality over a vector alphabet, in which the local codes themselves are regenerating codes. There are two well known classes of regenerating codes known as minimum storage regenerating (MSR) codes and minimum bandwidth regenerating (MBR) codes. We derive two upper bounds on the minimum distance of vector-alphabet codes with locality, one for the case when the local codes are MSR codes and the second for the case when the local codes are MBR codes. We also provide several optimal constructions of both classes of codes which achieve their respective minimum distance bounds with equality. The third problem deals with locally correctable codes. A block code of length `n' is said to be locally correctable, if there exists a randomized algorithm such that any one of the coordinates of the codeword can be recovered by querying at most `r' coordinates, even in presence of some fraction of errors. We study the local correctability of linear codes whose duals contain 4-designs. We also derive a bound relating `r' and fraction of errors that can be tolerated, when each instance of the randomized algorithm is `t'-error correcting instead of simple parity computation. Coding Theory Distributed Function Computation Distributed Storage Coding Locally Correctable Codes Linear Codes Encoding Regeneration Codes Error Correcting Codes Information Theory Local Erasure Correction Minimum Storage Regenerating (MSR) Codes Distributed Storage Network Distributed Subspace Computation Computer Science

1

Page generated in 0.1061 seconds