Global ETD Search

1	Policy architecture for distributed storage systems Belaramani, Nalini Moti 15 October 2009 (has links) Distributed data storage is a building block for many distributed systems such as mobile file systems, web service replication systems, enterprise file systems, etc. New distributed data storage systems are frequently built as new environment, requirements or workloads emerge. The goal of this dissertation is to develop the science of distributed storage systems by making it easier to build new systems. In order to achieve this goal, it proposes a new policy architecture, PADS, that is based on two key ideas: first, by providing a set of common mechanisms in an underlying layer, new systems can be implemented by defining policies that orchestrate these mechanisms; second, policy can be separated into routing and blocking policy, each addresses different parts of the system design. Routing policy specifies how data flow among nodes in order to meet performance, availability, and resource usage goals, whereas blocking policy specifies when it is safe to access data in order to meet consistency and durability goals. This dissertation presents a PADS prototype that defines a set of distributed storage mechanisms that are sufficiently flexible and general to support a large range of systems, a small policy API that is easy to use and captures the right abstractions for distributed storage, and a declarative language for specifying policy that enables quick, concise implementations of complex systems. We demonstrate that PADS is able to significantly reduce development effort by constructing a dozen significant distributed storage systems spanning a large portion of the design space over the prototype. We find that each system required only a couple of weeks of implementation effort and required a few dozen lines of policy code. / text Distributed data storage Policy architecture PADS Routing policy Blocking policy Common mechanisms Distributed systems
2	\"Armazenamento distribuído de dados e checkpointing de aplicações paralelas em grades oportunistas\" / Distributed data storage and checkpointing of parallel applications in opportunistic grids Camargo, Raphael Yokoingawa de 04 May 2007 (has links) Grades computacionais oportunistas utilizam recursos ociosos de máquinas compartilhadas para executar aplicações que necessitam de um alto poder computacional e/ou trabalham com grandes quantidades de dados. Mas a execução de aplicações paralelas computacionalmente intensivas em ambientes dinâmicos e heterogêneos, como grades computacionais oportunistas, é uma tarefa difícil. Máquinas podem falhar, ficar inacessíveis ou passar de ociosas para ocupadas inesperadamente, comprometendo a execução de aplicações. Um mecanismo de tolerância a falhas que dê suporte a arquiteturas heterogêneas é um importante requisito para estes sistemas. Neste trabalho, analisamos, implementamos e avaliamos um mecanismo de tolerância a falhas baseado em checkpointing para aplicações paralelas em grades computacionais oportunistas. Este mecanismo permite o monitoramento de execuções e a migração de aplicações entre nós heterogêneos da grade. Mas além da execução, é preciso gerenciar e armazenar os dados gerados e utilizados por estas aplicações. Desejamos uma infra-estrutura de armazenamento de dados de baixo custo e que utilize o espaço livre em disco de máquinas compartilhadas da grade. Devemos utilizar somente os ciclos ociosos destas máquinas para armazenar e recuperar dados, de modo que um sistema de armazenamento distribuído que as utilize deve ser redundante e tolerante a falhas. Para resolver o problema do armazenamento de dados em grades oportunistas, projetamos, implementamos e avaliamos o middleware OppStore. Este middleware provê armazenamento distribuído e confiável de dados, que podem ser acessados de qualquer máquina da grade. As máquinas são organizadas em aglomerados, que são conectados por uma rede peer-to-peer auto-organizável e tolerante a falhas. Dados são codificados em fragmentos redundantes antes de serem armazenados, de modo que arquivos podem ser reconstruídos utilizando apenas um subconjunto destes fragmentos. Finalmente, para lidar com a heterogeneidade dos recursos, desenvolvemos uma extensão ao protocolo de roteamento em redes peer-to-peer Pastry. Esta extensão adiciona balanceamento de carga e suporte à heterogeneidade de máquinas ao protocolo Pastry. / Opportunistic computational grids use idle resources from shared machines to execute applications that need large amounts of computational power and/or deal with large amounts of data. But executing computationally intensive parallel applications in dynamic and heterogeneous environments, such as opportunistic grids, is a daunting task. Machines may fail, become inaccessible, or change from idle to occupied unexpectedly, compromising the application execution. A fault tolerance mechanism that supports heterogeneous architectures is an important requisite for such systems. In this work, we analyze, implement and evaluate a checkpointing-based fault tolerance mechanism for parallel applications running on opportunistic grids. The mechanism monitors application execution and allows the migration of applications between heterogeneous nodes of the grid. But besides application execution, it is necessary to manage data generated and used by those applications. We want a low cost data storage infrastructure that utilizes the unused disk space of grid shared machines. The system should use the machines to store and recover data only during their idle periods, requiring the system to be redundant and fault-tolerant. To solve the data storage problem in opportunistic grids, we designed, implemented and evaluated the OppStore middleware. This middleware provides reliable distributed storage for application data, which can be accessed from any machine in the grid. The machines are organized in clusters, connected by a self-organizing and fault-tolerant peer-to-peer network. During storage, data is codified into redundant fragments, allowing the reconstruction of the original file using only a subset of those fragments. Finally, to deal with resource heterogeneity, we developed an extension to the Pastry peer-to-peer routing substrate, enabling heterogeneity-aware load-balancing message routing. armazenamento distribuído BSP BSP checkpointing checkpointing computational grids distributed data storage fault-tolerance grades computacionais grid computing peer-to-peer peer-to-peer tolerância a falhas
3	\"Armazenamento distribuído de dados e checkpointing de aplicações paralelas em grades oportunistas\" / Distributed data storage and checkpointing of parallel applications in opportunistic grids Raphael Yokoingawa de Camargo 04 May 2007 (has links) Grades computacionais oportunistas utilizam recursos ociosos de máquinas compartilhadas para executar aplicações que necessitam de um alto poder computacional e/ou trabalham com grandes quantidades de dados. Mas a execução de aplicações paralelas computacionalmente intensivas em ambientes dinâmicos e heterogêneos, como grades computacionais oportunistas, é uma tarefa difícil. Máquinas podem falhar, ficar inacessíveis ou passar de ociosas para ocupadas inesperadamente, comprometendo a execução de aplicações. Um mecanismo de tolerância a falhas que dê suporte a arquiteturas heterogêneas é um importante requisito para estes sistemas. Neste trabalho, analisamos, implementamos e avaliamos um mecanismo de tolerância a falhas baseado em checkpointing para aplicações paralelas em grades computacionais oportunistas. Este mecanismo permite o monitoramento de execuções e a migração de aplicações entre nós heterogêneos da grade. Mas além da execução, é preciso gerenciar e armazenar os dados gerados e utilizados por estas aplicações. Desejamos uma infra-estrutura de armazenamento de dados de baixo custo e que utilize o espaço livre em disco de máquinas compartilhadas da grade. Devemos utilizar somente os ciclos ociosos destas máquinas para armazenar e recuperar dados, de modo que um sistema de armazenamento distribuído que as utilize deve ser redundante e tolerante a falhas. Para resolver o problema do armazenamento de dados em grades oportunistas, projetamos, implementamos e avaliamos o middleware OppStore. Este middleware provê armazenamento distribuído e confiável de dados, que podem ser acessados de qualquer máquina da grade. As máquinas são organizadas em aglomerados, que são conectados por uma rede peer-to-peer auto-organizável e tolerante a falhas. Dados são codificados em fragmentos redundantes antes de serem armazenados, de modo que arquivos podem ser reconstruídos utilizando apenas um subconjunto destes fragmentos. Finalmente, para lidar com a heterogeneidade dos recursos, desenvolvemos uma extensão ao protocolo de roteamento em redes peer-to-peer Pastry. Esta extensão adiciona balanceamento de carga e suporte à heterogeneidade de máquinas ao protocolo Pastry. / Opportunistic computational grids use idle resources from shared machines to execute applications that need large amounts of computational power and/or deal with large amounts of data. But executing computationally intensive parallel applications in dynamic and heterogeneous environments, such as opportunistic grids, is a daunting task. Machines may fail, become inaccessible, or change from idle to occupied unexpectedly, compromising the application execution. A fault tolerance mechanism that supports heterogeneous architectures is an important requisite for such systems. In this work, we analyze, implement and evaluate a checkpointing-based fault tolerance mechanism for parallel applications running on opportunistic grids. The mechanism monitors application execution and allows the migration of applications between heterogeneous nodes of the grid. But besides application execution, it is necessary to manage data generated and used by those applications. We want a low cost data storage infrastructure that utilizes the unused disk space of grid shared machines. The system should use the machines to store and recover data only during their idle periods, requiring the system to be redundant and fault-tolerant. To solve the data storage problem in opportunistic grids, we designed, implemented and evaluated the OppStore middleware. This middleware provides reliable distributed storage for application data, which can be accessed from any machine in the grid. The machines are organized in clusters, connected by a self-organizing and fault-tolerant peer-to-peer network. During storage, data is codified into redundant fragments, allowing the reconstruction of the original file using only a subset of those fragments. Finally, to deal with resource heterogeneity, we developed an extension to the Pastry peer-to-peer routing substrate, enabling heterogeneity-aware load-balancing message routing. armazenamento distribuído BSP checkpointing grades computacionais peer-to-peer tolerância a falhas BSP checkpointing computational grids distributed data storage fault-tolerance grid computing peer-to-peer
4	Codes With Locality For Distributed Data Storage Moorthy, Prakash Narayana 03 1900 (has links) (PDF) This thesis deals with the problem of code design in the setting of distributed storage systems consisting of multiple storage nodes, storing many different data les. A primary goal in such systems is the efficient repair of a failed node. Regenerating codes and codes with locality are two classes of coding schemes that have recently been proposed in literature to address this goal. While regenerating codes aim to minimize the amount of data-download needed to carry out node repair, codes with locality seek to minimize the number of nodes accessed during node repair. Our focus here is on linear codes with locality, which is a concept originally introduced by Gopalan et al. in the context of recovering from a single node failure. A code-symbol of a linear code C is said to have locality r, if it can be recovered via a linear combination of r other code-symbols of C. The code C is said to have (i) information-symbol locality r, if all of its message symbols have locality r, and (ii) all-symbol locality r, if all the code-symbols have locality r. We make the following three contributions to the area of codes with locality. Firstly, we extend the notion of locality, in two directions, so as to permit local recovery even in the presence of multiple node failures. In the first direction, we consider codes with \local error correction" in which a code-symbol is protected by a local-error-correcting code having local-minimum-distance 3, and thus allowing local recovery of the code-symbol even in the presence of 2 other code-symbol erasures. In the second direction, we study codes with all-symbol locality that can recover from two erasures via a sequence of two local, parity-check computations. When restricted to the case of all-symbol locality and two erasures, the second approach allows, in general, for design of codes having larger minimum distance than what is possible via the rst approach. Under both approaches, by studying the generalized Hamming weights of the dual codes, we derive tight upper bounds on their respective minimum distances. Optimal code constructions are identified under both approaches, for a class of code parameters. A few interesting corollaries result from this part of our work. Firstly, we obtain a new upper bound on the minimum distance of concatenated codes and secondly, we show how it is always possible to construct the best-possible code (having largest minimum distance) of a given dimension when the code's parity check matrix is partially specified. In a third corollary, we obtain a new upper bound for the minimum distance of codes with all-symbol locality in the single erasure case. Secondly, we introduce the notion of codes with local regeneration that seek to combine the advantages of both codes with locality as well as regenerating codes. These are vector-alphabet analogues of codes with local error correction in which the local codes themselves are regenerating codes. An upper bound on the minimum distance is derived when the constituent local codes have a certain uniform rank accumulation (URA) property. This property is possessed by both the minimum storage regenerating (MSR) and the minimum bandwidth regenerating (MBR) codes. We provide several optimal constructions of codes with local regeneration, where the local codes are either the MSR or the MBR codes. The discussion here is also extended to the case of general vector-linear codes with locality, in which the local codes do not necessarily have the URA property. Finally, we evaluate the efficacy of two specific coding solutions, both possessing an inherent double replication of data, in a practical distributed storage setting known as Hadoop. Hadoop is an open-source platform dealing with distributed storage of data in which the primary aim is to perform distributed computation on the stored data via a paradigm known as Map Reduce. Our evaluation shows that while these codes have efficient repair properties, their vector-alphabet-nature can negatively a affect Map Reduce performance, if they are implemented under the current Hadoop architecture. Specifically, we see that under the current architecture, the choice of number processor cores per node and Map-task scheduling algorithm play a major role in determining their performance. The performance evaluation is carried out via a combination of simulations and actual experiments in Hadoop clusters. As a remedy to the problem, we also pro-pose a modified architecture in which one allows erasure coding across blocks belonging to different les. Under the modified architecture, the new coding solutions will not suffer from any Map Reduce performance-loss as seen in the original architecture, while retaining all of their desired repair properties Distributed Storage Coding Regeneration Codes Local Repair Codes Linear Codes Information Theory Error Correcting Codes Encoding Vector Codes Minimum Storage Regenerating (MSR) Codes Coding Theory Hadoop Distributed Data Storage Computer Science

1

Page generated in 0.1224 seconds