Global ETD Search

11	Evaluating I/O scheduling techniques at the forwarding layer and coordinating data server accesses / Avaliação de técnicas de escalonamento de E/S na camada de encaminhamento e coordenação de acesso aos servidores de dados Bez, Jean Luca January 2016 (has links) Em ambientes de Computação de Alto Desempenho, as aplicações científicas dependem dos Sistemas de Arquivos Paralelos (SAP) para obter desempenho de Entrada/Saída (E/S), especialmente ao lidar com grandes quantidades de dados. No entanto, E/S ainda é um gargalo para um número crescente de aplicações, devido à diferença histórica entre a velocidade de processamento e de acesso aos dados. Para aliviar a concorrência causada por milhares de nós que acessam um número significativamente menor de servidores SAP, normalmente nós intermediários de E/S são adicionados entre os nós de processamento e o sistema de arquivos. Cada nó intermediário encaminha solicitações de vários clientes para o sistema, uma configuração que dá a este componente a oportunidade de executar otimizações como o escalonamento de requisições de E/S. O objetivo desta dissertação é avaliar diferentes algoritmos de escalonamento, na camada de encaminhamento de E/S, cuja finalidade é melhorar o padrão de acesso das aplicações, agregando e reordenando requisições para evitar padrões que são conhecidos por prejudicar o desempenho. Demonstramos que os escalonadores FIFO (First In, First Out), HBRR (Handle-Based Round-Robin), TO (Time Order), SJF (Shortest Job First) e MLF (Multilevel Feedback) são apenas parcialmente eficazes porque o padrão de acesso não é o principal fator que afeta o desempenho na camada de encaminhamento de E/S, especialmente para requisições de leitura Um novo algoritmo de escalonamento chamado TWINS é proposto para coordenar o acesso de nós intermediários de E/S aos servidores de dados do sistema de arquivos paralelo. Nossa abordagem reduz a concorrência nos servidores de dados, um fator previamente demonstrado como reponsável por afetar negativamente o desempenho. O algoritmo proposto é capaz de melhorar o tempo de leitura de arquivos compartilhados em até 28% se comparado a outros algoritmos de escalonamento e em até 50% se comparado a não fazer o encaminhamento de requisições de E/S. / In High Performance Computing (HPC) environments, scientific applications rely on Parallel File Systems (PFS) to obtain Input/Output (I/O) performance especially when handling large amounts of data. However, I/O is still a bottleneck for an increasing number of applications, due to the historical gap between processing and data access speed. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically employed between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the parallel file system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. The objective of this dissertation is to evaluate different scheduling algorithms, at the I/O forwarding layer, that work to improve concurrent access patterns by aggregating and reordering requests to avoid patterns known to harm performance. We demonstrate that the FIFO (First In, First Out), HBRR (Handle- Based Round-Robin), TO (Time Order), SJF (Shortest Job First) and MLF (Multilevel Feedback) schedulers are only partially effective because the access pattern is not the main factor that affects performance in the I/O forwarding layer, especially for read requests. A new scheduling algorithm, TWINS, is proposed to coordinate the access of intermediate I/O nodes to the parallel file system data servers. Our approach decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O requests. Processamento paralelo Computacao cientifica : Alto desempenho High performance I/O Parallel file systems Parallel I/O I/O forwarding I/O scheduling Access coordination
12	Evaluating I/O scheduling techniques at the forwarding layer and coordinating data server accesses / Avaliação de técnicas de escalonamento de E/S na camada de encaminhamento e coordenação de acesso aos servidores de dados Bez, Jean Luca January 2016 (has links) Em ambientes de Computação de Alto Desempenho, as aplicações científicas dependem dos Sistemas de Arquivos Paralelos (SAP) para obter desempenho de Entrada/Saída (E/S), especialmente ao lidar com grandes quantidades de dados. No entanto, E/S ainda é um gargalo para um número crescente de aplicações, devido à diferença histórica entre a velocidade de processamento e de acesso aos dados. Para aliviar a concorrência causada por milhares de nós que acessam um número significativamente menor de servidores SAP, normalmente nós intermediários de E/S são adicionados entre os nós de processamento e o sistema de arquivos. Cada nó intermediário encaminha solicitações de vários clientes para o sistema, uma configuração que dá a este componente a oportunidade de executar otimizações como o escalonamento de requisições de E/S. O objetivo desta dissertação é avaliar diferentes algoritmos de escalonamento, na camada de encaminhamento de E/S, cuja finalidade é melhorar o padrão de acesso das aplicações, agregando e reordenando requisições para evitar padrões que são conhecidos por prejudicar o desempenho. Demonstramos que os escalonadores FIFO (First In, First Out), HBRR (Handle-Based Round-Robin), TO (Time Order), SJF (Shortest Job First) e MLF (Multilevel Feedback) são apenas parcialmente eficazes porque o padrão de acesso não é o principal fator que afeta o desempenho na camada de encaminhamento de E/S, especialmente para requisições de leitura Um novo algoritmo de escalonamento chamado TWINS é proposto para coordenar o acesso de nós intermediários de E/S aos servidores de dados do sistema de arquivos paralelo. Nossa abordagem reduz a concorrência nos servidores de dados, um fator previamente demonstrado como reponsável por afetar negativamente o desempenho. O algoritmo proposto é capaz de melhorar o tempo de leitura de arquivos compartilhados em até 28% se comparado a outros algoritmos de escalonamento e em até 50% se comparado a não fazer o encaminhamento de requisições de E/S. / In High Performance Computing (HPC) environments, scientific applications rely on Parallel File Systems (PFS) to obtain Input/Output (I/O) performance especially when handling large amounts of data. However, I/O is still a bottleneck for an increasing number of applications, due to the historical gap between processing and data access speed. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically employed between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the parallel file system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. The objective of this dissertation is to evaluate different scheduling algorithms, at the I/O forwarding layer, that work to improve concurrent access patterns by aggregating and reordering requests to avoid patterns known to harm performance. We demonstrate that the FIFO (First In, First Out), HBRR (Handle- Based Round-Robin), TO (Time Order), SJF (Shortest Job First) and MLF (Multilevel Feedback) schedulers are only partially effective because the access pattern is not the main factor that affects performance in the I/O forwarding layer, especially for read requests. A new scheduling algorithm, TWINS, is proposed to coordinate the access of intermediate I/O nodes to the parallel file system data servers. Our approach decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O requests. Processamento paralelo Computacao cientifica : Alto desempenho High performance I/O Parallel file systems Parallel I/O I/O forwarding I/O scheduling Access coordination
13	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Polato, Ivanilton 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Distributed file systems Eficiência energética Energy efficiency Green computing Hadoop Hadoop HDFS HDFS Hybrid storage Parallel file systems Sistema de arquivos distribuído Sistemas de arquivos paralelo Solid-state disk SSDs SSDs
14	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Ivanilton Polato 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Eficiência energética Hadoop HDFS Sistema de arquivos distribuído Sistemas de arquivos paralelo SSDs Distributed file systems Energy efficiency Green computing Hadoop HDFS Hybrid storage Parallel file systems Solid-state disk SSDs
15	Data Transfer and Management through the IKAROS framework : Adopting an asynchronous non-blocking event driven approach to implement the Elastic-Transfer's IMAP client-server connection Gkikas, Nikolaos January 2015 (has links) Given the current state of input/output (I/O) and storage devices in petascale systems, incremental solutions would be ineffective when implemented in exascale environments. According to the "The International Exascale Software Roadmap", by Dongarra, et al. existing I/O architectures are not sufficiently scalable, especially because current shared file systems have limitations when used in large-scale environments. These limitations are: Bandwidth does not scale economically to large-scale systems, I/O traffic on the high speed network can impact on and be influenced by other unrelated jobs, and I/O traffic on the storage server can impact on and be influenced by other unrelated jobs. Future applications on exascale computers will require I/O bandwidth proportional to their computational capabilities. To avoid these limitations C. Filippidis, C. Markou, and Y. Cotronis proposed the IKAROS framework. In this thesis project, the capabilities of the publicly available elastic-transfer (eT) module which was directly derived from the IKAROS, will be expanded. The eT uses Google’s Gmail service as an utility for efficient meta-data management. Gmail is based on the IMAP protocol, and the existing version of the eT framework implements the Internet Message Access Protocol (IMAP) client-server connection through the ‘‘Inbox’’ module from the Node Package Manager (NPM) of the Node.js programming language. This module was used as a proof of concept, but in a production environment this implementation undermines the system’s scalability and there is an inefficient allocation of the system’s resources when a large number of concurrent requests arrive at the eT′s meta-data server (MDS) at the same time. This thesis solves this problem by adopting an asynchronous non-blocking event driven approach to implement the IMAP client-server connection. This was done by integrating and modifying the ‘‘Imap’’ NPM module from the NPM repository to suit the eT framework. Additionally, since the JavaScript Object Notation (JSON) format has become one of the most widespread data-interchange formats, eT′s meta-data scheme is appropriately modified to make the system’s meta-data easily parsed as JSON objects. This feature creates a framework with wider compatibility and interoperability with external systems. The evaluation and operational behavior of the new module was tested through a set of data transfer experiments over a wide area network environment. These experiments were performed to ensure that the changes in the system’s architecture did not affected its performance. / Givet det nuvarande läget för input/output (I/O) och lagringsenheter för system i peta-skala, skulle inkrementella lösningar bli ineffektiva om de implementerades i exa-skalamiljöer. Enligt ”The International Exascale Software Roadmap”, av Dongarra et al., är nuvarande I/O-arkitekturer inte tillräckligt skalbara, särskilt eftersom nuvarande delade filsystem har begränsningar när de används i storskaliga miljöer. Dessa begränsningar är: Bandbredd skalar inte på ett ekonomiskt sätt i storskaliga system, I/O-trafik på höghastighetsnätverk kan ha påverkan på och blir påverkad av andra orelaterade jobb, och I/O-trafik på lagringsservern kan ha påverkan på och bli påverkad av andra orelaterade jobb. Framtida applikationer på exa-skaladatorer kommer kräva I/O-bandbredd proportionellt till deras beräkningskapacitet. För att undvika dessa begränsningar föreslog C. Filippidis, C. Markou och Y. Cotronis ramverket IKAROS. I detta examensarbete utökas funktionaliteten hos den publikt tillgängliga modulen elastic-transfer (eT) som framtagits utifrån IKAROS. Den befintliga versionen av eT-ramverket implementerar Internet Message Access Protocol (IMAP) klient-serverkommunikation genom modulen ”Inbox” från Node Package Manager (NPM) ur Node.js programmeringsspråk. Denna modul användes som ett koncepttest, men i en verklig miljö så underminerar denna implementation systemets skalbarhet när ett stort antal värdar ansluter till systemet. Varje klient begär individuellt information relaterad till systemets metadata från IMAP-servern, vilket leder till en ineffektiv allokering av systemets resurser när ett stort antal värdar är samtidigt anslutna till eT-ramverket. Denna uppsats löser problemet genom att använda ett asynkront, icke-blockerande och händelsedrivet tillvägagångssätt för att implementera en IMAP klient-serveranslutning. Detta görs genom att integrera och modifiera NPM:s ”Imap”-modul, tagen från NPM:s katalog, så att den passar eT-ramverket. Eftersom formatet JavaScript Object Notation (JSON) har blivit ett av de mest spridda formaten för datautbyte så modifieras även eT:s metadata-struktur för att göra systemets metadata enkelt att omvandla till JSON-objekt. Denna funktionalitet ger ett bredare kompatibilitet och interoperabilitet med externa system. Utvärdering och tester av den nya modulens operationella beteende utfördes genom en serie dataöverföringsexperiment i en wide area network-miljö. Dessa experiment genomfördes för att få bekräftat att förändringarna i systemets arkitektur inte påverkade dess prestanda. parallel file systems distributed file systems IKAROS file system elastic-transfer grid computing storage systems I/O limitations exascale low power consumption low cost devices synchronous blocking asynchronous non-blocking event-driven JSON. parallella filsystem distribuerade filsystem IKAROS filsystem elastic-transfer grid computing lagringssystem I/O-begränsningar exa-skala låg energiförbrukning lågkostnadsenheter synkron blockerande asynkron icke-blockerande händelsedriven JSON Communication Systems Kommunikationssystem

Page generated in 0.0836 seconds