Global ETD Search

31	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Ivanilton Polato 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Eficiência energética Hadoop HDFS Sistema de arquivos distribuído Sistemas de arquivos paralelo SSDs Distributed file systems Energy efficiency Green computing Hadoop HDFS Hybrid storage Parallel file systems Solid-state disk SSDs
32	Exploring atomicity on memory mapped files based on non-volatile memory file systems Puglia, Gianlucca Oliveira 21 March 2017 (has links) Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2017-12-11T16:00:35Z No. of bitstreams: 1 Gianlucca_Oliveira_Puglia_dis.pdf: 2043630 bytes, checksum: f7fc70f33d1d15b56eded8458fbed2fa (MD5) / Approved for entry into archive by Tatiana Lopes (tatiana.lopes@pucrs.br) on 2017-12-18T11:25:26Z (GMT) No. of bitstreams: 1 Gianlucca_Oliveira_Puglia_dis.pdf: 2043630 bytes, checksum: f7fc70f33d1d15b56eded8458fbed2fa (MD5) / Made available in DSpace on 2017-12-18T11:49:55Z (GMT). No. of bitstreams: 1 Gianlucca_Oliveira_Puglia_dis.pdf: 2043630 bytes, checksum: f7fc70f33d1d15b56eded8458fbed2fa (MD5) Previous issue date: 2017-03-21 / As tecnologias de mem?rias n?o-vol?teis s?o uma grande promessa na ?rea de arquitetura de computadores e ? esperado que sejam poderosas ferramentas para solucionar os problemas referentes a manipula??o eficiente de dados dos dias de hoje. Estas tecnologias prov?m alta performance e acesso em granularidade de bytes com a distinta vantagem de serem persistentes. Por?m, afim de explorar estas tecnologias em todo seu potencial, os sistemas e arquiteturas de hoje precisam buscar meios de se adaptar a esta nova forma de acessar dados e de superar os desafios que v?m com ela.Trabalhos existentes na ?rea j? prop?em m?todos para adaptar as arquiteturas existentes para o uso de NVM bem como formas inovadoras de empregar estas mem?rias em futuras aplica??es. No entanto, o suporte dos sistemas operacionais a estas solu??es, ainda que existente, ainda ? muito limitado. Neste trabalho, n?s apresentamos duas varia??es da chamada de sistema msync, modeladas para explorar as caracter?sticas das tecnologias de NVM e garantir consist?ncia para os dados dos usu?rios. Ambas s?o solu??es simples que permitem aos usu?rios definirem checkpoints de seus arquivos usando a sintaxe comum de sistemas de arquivos. N?s implementamos e testamos estes m?todos sobre o sistema operacional Linux utilizando como base um sistema de arquivo nativamente voltado a NVM. Nossos resultados mostram que estes mecanismos s?o capazes de garantir a integridade dos arquivos mesmo na presen?a de falhas no sistema enquanto mant?m uma performance razo?vel. / Upcoming non-volatile memory technologies are a big promise in computer architecture and are expected to be powerful tools to address today?s issues regarding efficient data manipulation. They provide high performance and byte granularity while also having the distinct advantage of being persistent. However in order to explore these technologies to their full potential, existing systems and architecture must adapt to this new way of working with data and workaround the challenges that come with it. Existing work in the area already proposes methods to adapt existing architecture to NVM as well as innovative ways to employ these memories in future applications. However operating system support to such NVM-enabled solutions, although existent, still very limited. In this work, we present two variations of the existing mmap system call, designed to both explore NVM characteristics and provide user data consistency. Both are very simple solutions that allow users to control the persistence and define checkpoints to their files while using the common mapped file syntax. We have implemented and tested these methods over Linux using a NVM file system as our base. Our results show that these mechanisms can ensure file integrity in the presence of system failures while also providing a reasonable performance. Non-Volatile Memory (NVM) Operating Systems (OS) Systematic Mapping Study (SMS) File Systems
33	Reintegração de servidores em sistemas distribuídos / Reintegration of failed server in distributed systems Pasin, Marcia January 1998 (has links) Sistemas distribuídos representam uma plataforma ideal para implementação de sistemas computacionais com alta confiabilidade e disponibilidade devido a redundância fornecida por um grande número de estações interligadas. Falhas de um servidor podem ser contornadas pela reconfiguração do sistema. Entretanto falhas em seqüência que afetem múltiplas estações comprometem não apenas o desempenho do sistema, mas também a continuidade do serviço e sua confiabilidade. Assim, servidores falhos, que tenham sido isolados do sistema, devem ser reintegrados tão logo quanto possível para não comprometer a disponibilidade do sistema computacional. Este trabalho trata da atualização do estado de servidores e da troca de informação que o servidor recuperado realiza para integrar-se aos demais membros do sistema através de um procedimento chamado reintegração do servidor. E assumido um ambiente distribuído que garante alta confiabilidade em aplicações convencionais através da técnica de replicação de arquivos. O servidor a ser reintegrado faz parte de um grupo de replicação e volta a participar ativamente do grupo tão logo seja reintegrado. Para tanto, considera-se a estratégia de replicação por copia primaria e um sistema distribuído experimental, compatível com o NFS, desenvolvido na UFRGS para aplicar a reintegração de servidor. Os métodos de atualização de arquivos para a reintegração do servidor foram implementadas no ambiente UNIX. / Distributed systems are an ideal platform to develop high reliable computer applications due to the redundancy supplied by a great number of interconnected workstations. Failed stations can be masked reconfiguring the system. However, sequential faults, that affect multiple stations, not just decrease the performance of the system, but also affect the continuity of the service and its reliability. Thus, failed stations working as servers, that have been isolated from the system, should be reintegrated as soon as possible to not impair the system availability. This work is exactly about methods to update the state of failed servers. It deals also with the change of information that the recovered server accomplishes to be integrated to the other members of the service group through a process called reintegration of server. It is assumed a distributed environment that guarantees high reliability in conventional applications through replication of files. The server to be reintegrated is part of a replication group and it participates actively of the service group again as soon as it is reintegrated. Our approach is based on a primary copy. The file actualization methods to the reintegration of server were implemented in an UNIX environment. To illustrate our approach we will describe how the integration of repaired server can be made a fault-tolerant system. The experimental distributed system, compatible with NFS, was designed at the UFRGS. Confiabilidade : Computadores Tolerancia : Falhas Reintegracao : Servidores Sistemas operacionais distribuidos Distributed file systems Fault tolerance Reintegration of failed server
34	Extreme scale data management in high performance computing Lofstead, Gerald Fredrick 15 November 2010 (has links) Extreme scale data management in high performance computing requires consideration of the end-to-end scientific workflow process. Of particular importance for runtime performance, the write-read cycle must be addressed as a complete unit. Any optimization made to enhance writing performance must consider the subsequent impact on reading performance. Only by addressing the full write-read cycle can scientific productivity be enhanced. The ADIOS middleware developed as part of this thesis provides an API nearly as simple as the standard POSIX interface, but with the flexibilty to choose what transport mechanism(s) to employ at or during runtime. The accompanying BP file format is designed for high performance parallel output with limited coordination overheads while incorporating features to accelerate subsequent use of the output for reading operations. This pair of optimizations of the output mechanism and the output format are done such that they either do not negatively impact or greatly improve subsequent reading performance when compared to popular self-describing file formats. This end-to-end advantage of the ADIOS architecture is further enhanced through techniques to better enable asychronous data transports affording the incorporation of 'in flight' data processing operations and pseudo-transport mechanisms that can trigger workflows or other operations. Adaptive File systems HPC IO Storage File organization (Computer science)
35	Reducing Size and Complexity of the Security-Critical Code Base of File Systems Weinhold, Carsten 09 July 2014 (has links) (PDF) Desktop and mobile computing devices increasingly store critical data, both personal and professional in nature. Yet, the enormous code bases of their monolithic operating systems (hundreds of thousands to millions of lines of code) are likely to contain exploitable weaknesses that jeopardize the security of this data in the file system. Using a highly componentized system architecture based on a microkernel (or a very small hypervisor) can significantly improve security. The individual operating system components have smaller code bases running in isolated address spaces so as to provide better fault containment. Their isolation also allows for smaller trusted computing bases (TCBs) of applications that comprise only a subset of all components. In my thesis, I built VPFS, a virtual private file system that is designed for such a componentized system architecture. It aims at reducing the amount of code and complexity that a file system implementation adds to the TCB of an application. The basic idea behind VPFS is similar to that of a VPN, which securely reuses an untrusted network: The core component of VPFS implements all functionality and cryptographic algorithms that an application needs to rely upon for confidentiality and integrity of file system contents. These security-critical cores reuse a much more complex and therefore untrusted file system stack for non-critical functionality and access to the storage device. Additional trusted components ensure recoverability. Sicherheit Dateisysteme Trusted Computing Base Systemarchitektur Security file systems trusted computing base system architecture ddc:004 rvk:ST 260 rvk:ST 277
36	Reintegração de servidores em sistemas distribuídos / Reintegration of failed server in distributed systems Pasin, Marcia January 1998 (has links) Sistemas distribuídos representam uma plataforma ideal para implementação de sistemas computacionais com alta confiabilidade e disponibilidade devido a redundância fornecida por um grande número de estações interligadas. Falhas de um servidor podem ser contornadas pela reconfiguração do sistema. Entretanto falhas em seqüência que afetem múltiplas estações comprometem não apenas o desempenho do sistema, mas também a continuidade do serviço e sua confiabilidade. Assim, servidores falhos, que tenham sido isolados do sistema, devem ser reintegrados tão logo quanto possível para não comprometer a disponibilidade do sistema computacional. Este trabalho trata da atualização do estado de servidores e da troca de informação que o servidor recuperado realiza para integrar-se aos demais membros do sistema através de um procedimento chamado reintegração do servidor. E assumido um ambiente distribuído que garante alta confiabilidade em aplicações convencionais através da técnica de replicação de arquivos. O servidor a ser reintegrado faz parte de um grupo de replicação e volta a participar ativamente do grupo tão logo seja reintegrado. Para tanto, considera-se a estratégia de replicação por copia primaria e um sistema distribuído experimental, compatível com o NFS, desenvolvido na UFRGS para aplicar a reintegração de servidor. Os métodos de atualização de arquivos para a reintegração do servidor foram implementadas no ambiente UNIX. / Distributed systems are an ideal platform to develop high reliable computer applications due to the redundancy supplied by a great number of interconnected workstations. Failed stations can be masked reconfiguring the system. However, sequential faults, that affect multiple stations, not just decrease the performance of the system, but also affect the continuity of the service and its reliability. Thus, failed stations working as servers, that have been isolated from the system, should be reintegrated as soon as possible to not impair the system availability. This work is exactly about methods to update the state of failed servers. It deals also with the change of information that the recovered server accomplishes to be integrated to the other members of the service group through a process called reintegration of server. It is assumed a distributed environment that guarantees high reliability in conventional applications through replication of files. The server to be reintegrated is part of a replication group and it participates actively of the service group again as soon as it is reintegrated. Our approach is based on a primary copy. The file actualization methods to the reintegration of server were implemented in an UNIX environment. To illustrate our approach we will describe how the integration of repaired server can be made a fault-tolerant system. The experimental distributed system, compatible with NFS, was designed at the UFRGS. Confiabilidade : Computadores Tolerancia : Falhas Reintegracao : Servidores Sistemas operacionais distribuidos Distributed file systems Fault tolerance Reintegration of failed server
37	Reintegração de servidores em sistemas distribuídos / Reintegration of failed server in distributed systems Pasin, Marcia January 1998 (has links) Sistemas distribuídos representam uma plataforma ideal para implementação de sistemas computacionais com alta confiabilidade e disponibilidade devido a redundância fornecida por um grande número de estações interligadas. Falhas de um servidor podem ser contornadas pela reconfiguração do sistema. Entretanto falhas em seqüência que afetem múltiplas estações comprometem não apenas o desempenho do sistema, mas também a continuidade do serviço e sua confiabilidade. Assim, servidores falhos, que tenham sido isolados do sistema, devem ser reintegrados tão logo quanto possível para não comprometer a disponibilidade do sistema computacional. Este trabalho trata da atualização do estado de servidores e da troca de informação que o servidor recuperado realiza para integrar-se aos demais membros do sistema através de um procedimento chamado reintegração do servidor. E assumido um ambiente distribuído que garante alta confiabilidade em aplicações convencionais através da técnica de replicação de arquivos. O servidor a ser reintegrado faz parte de um grupo de replicação e volta a participar ativamente do grupo tão logo seja reintegrado. Para tanto, considera-se a estratégia de replicação por copia primaria e um sistema distribuído experimental, compatível com o NFS, desenvolvido na UFRGS para aplicar a reintegração de servidor. Os métodos de atualização de arquivos para a reintegração do servidor foram implementadas no ambiente UNIX. / Distributed systems are an ideal platform to develop high reliable computer applications due to the redundancy supplied by a great number of interconnected workstations. Failed stations can be masked reconfiguring the system. However, sequential faults, that affect multiple stations, not just decrease the performance of the system, but also affect the continuity of the service and its reliability. Thus, failed stations working as servers, that have been isolated from the system, should be reintegrated as soon as possible to not impair the system availability. This work is exactly about methods to update the state of failed servers. It deals also with the change of information that the recovered server accomplishes to be integrated to the other members of the service group through a process called reintegration of server. It is assumed a distributed environment that guarantees high reliability in conventional applications through replication of files. The server to be reintegrated is part of a replication group and it participates actively of the service group again as soon as it is reintegrated. Our approach is based on a primary copy. The file actualization methods to the reintegration of server were implemented in an UNIX environment. To illustrate our approach we will describe how the integration of repaired server can be made a fault-tolerant system. The experimental distributed system, compatible with NFS, was designed at the UFRGS. Confiabilidade : Computadores Tolerancia : Falhas Reintegracao : Servidores Sistemas operacionais distribuidos Distributed file systems Fault tolerance Reintegration of failed server
38	Rethinking I/O in High-Performance Computing Environments Ali, Nawab January 2009 (has links) No description available. Computer Science High-performance I/O Parallel file systems Leadership-class machines Remote I/O High-performance computing
39	An Application-Attuned Framework for Optimizing HPC Storage Systems Paul, Arnab Kumar 19 August 2020 (has links) High performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportunities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems - in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for improving the efficiency of modern HPC storage systems. / Doctor of Philosophy / Clusters of multiple computers connected through internet are often deployed in industry and laboratories for large scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. With the increase in popularity of applications that read and write a tremendous amount of data, we need a large number of disks that can interact effectively in such clusters. This forms the part of high performance computing (HPC) storage systems. Such HPC storage systems are used by a diverse set of applications coming from organizations from a vast range of domains from earth sciences, financial services, telecommunication to life sciences. Therefore, the HPC storage system should be efficient to perform well for the different read and write (I/O) requirements from all the different sets of applications. But current HPC storage systems do not cater to the varied I/O requirements. To this end, this dissertation designs and develops a framework for HPC storage systems that is application-attuned and thus provides much improved performance than other state-of-the-art HPC storage systems without such optimizations. Parallel File Systems Object-Based Storage Data Management Load Balancing File System Indexing Metadata Management High Performance Computing
40	Snapshots in large-scale distributed file systems Stender, Jan 21 January 2013 (has links) Viele moderne Dateisysteme unterstützen Snapshots zur Erzeugung konsistenter Online-Backups, zur Wiederherstellung verfälschter oder ungewollt geänderter Dateien, sowie zur Rückverfolgung von Änderungen an Dateien und Verzeichnissen. Während frühere Arbeiten zu Snapshots in Dateisystemen vorwiegend lokale Dateisysteme behandeln, haben moderne Trends wie Cloud- oder Cluster-Computing dazu geführt, dass die Datenhaltung in verteilten Speichersystemen an Bedeutung gewinnt. Solche Systeme umfassen häufig eine Vielzahl an Speicher-Servern, was besondere Herausforderungen mit Hinblick auf Skalierbarkeit, Verfügbarkeit und Ausfallsicherheit mit sich bringt. Diese Arbeit beschreibt einen Snapshot-Algorithmus für großangelegte verteilte Dateisysteme und dessen Integration in XtreemFS, ein skalierbares objektbasiertes Dateisystem für Grid- und Cloud-Computing-Umgebungen. Die zwei Bausteine des Algorithmus sind ein System zur effizienten Erzeugung und Verwaltung von Dateiinhalts- und Metadaten-Versionen, sowie ein skalierbares, ausfallsicheres Verfahren zur Aggregation bestimmter Versionen in einem Snapshot. Um das Problem einer fehlenden globalen Zeit zu bewältigen, implementiert der Algorithmus ein weniger restriktives, auf Zeitstempeln lose synchronisierter Server-Uhren basierendes Konsistenzmodell für Snapshots. Die wesentlichen Beiträge der Arbeit sind: 1) ein formales Modell von Snapshots und Snapshot-Konsistenz in verteilten Dateisystemen; 2) die Beschreibung effizienter Verfahren zur Verwaltung von Metadaten- und Dateiinhalts-Versionen in objektbasierten Dateisystemen; 3) die formale Darstellung eines skalierbaren, ausfallsicheren Snapshot-Algorithmus für großangelegte objektbasierte Dateisysteme; 4) eine detaillierte Beschreibung der Implementierung des Algorithmus in XtreemFS. Eine umfangreiche Auswertung belegt, dass der vorgestellte Algorithmus die Nutzerdatenrate kaum negativ beeinflusst, und dass er mit großen Zahlen an Snapshots und Versionen skaliert. / Snapshots are present in many modern file systems, where they allow to create consistent on-line backups, to roll back corruptions or inadvertent changes of files, and to keep a record of changes to files and directories. While most previous work on file system snapshots refers to local file systems, modern trends like cloud and cluster computing have shifted the focus towards distributed storage infrastructures. Such infrastructures often comprise large numbers of storage servers, which presents particular challenges in terms of scalability, availability and failure tolerance. This thesis describes snapshot algorithm for large-scale distributed file systems and its integration in XtreemFS, a scalable object-based file system for grid and cloud computing environments. The two building blocks of the algorithm are a version management scheme, which efficiently records versions of file content and metadata, as well as a scalable and failure-tolerant mechanism that aggregates specific versions in a snapshot. To overcome the lack of a global time in a distributed system, the algorithm implements a relaxed consistency model for snapshots, which is based on timestamps assigned by loosely synchronized server clocks. The main contributions of the thesis are: 1) a formal model of snapshots and snapshot consistency in distributed file systems; 2) the description of efficient schemes for the management of metadata and file content versions in object-based file systems; 3) the formal presentation of a scalable, fault-tolerant snapshot algorithm for large-scale object-based file systems; 4) a detailed description of the implementation of the algorithm as part of XtreemFS. An extensive evaluation shows that the proposed algorithm has no severe impact on user I/O, and that it scales to large numbers of snapshots and versions. Snapshots verteilte Dateisysteme Skalierbarkeit XtreemFS snapshots distributed file systems scalability XtreemFS 004 Informatik 28 Informatik, Datenverarbeitung ST 265 ddc:004

Search results