Global ETD Search

11	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Ivanilton Polato 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Eficiência energética Hadoop HDFS Sistema de arquivos distribuído Sistemas de arquivos paralelo SSDs Distributed file systems Energy efficiency Green computing Hadoop HDFS Hybrid storage Parallel file systems Solid-state disk SSDs
12	Distribuerade datalagringssystem för tjänsteleverantörer : Undersökning av olika användningsfall för distribuerade datalagringssystem / Distributed Data Storage Systems for Service Providers : Investigation of different use cases for distributed data storage systems Ahmed, Tanvir Saif, Markovic, Bratislav January 2016 (has links) Detta examensarbete handlar om undersökning av tre olika användningsfall inom datalagring; Cold Storage, High Performance Storage och Virtual Machine Storage. Rapporten har som syfte att ge en översikt över kommersiella distribuerade filsystem samt en djupare undersökning av distribuerade filsystem som bygger på öppen källkod och därmed hitta en optimal lösning för dessa användnings-fall. I undersökningen ingick att analysera och jämföra tidigare arbeten där jämförelser mellan pre-standamätningar, dataskydd och kostnader utfördes samt lyfta upp diverse funktionaliteter (snapshotting, multi-tenancy, datadeduplicering, datareplikering) som moderna distribuerade filsy-stem kännetecknas av. Både kommersiella och öppna distribuerade filsystem undersöktes. Även en kostnadsuppskattning för kommersiella och öppna distribuerade filsystem gjordes för att ta reda på lönsamheten för dessa två typer av distribuerat filsystem.Efter att jämförelse och analys av olika tidigare arbeten utfördes, visade sig att det öppna distribue-rade filsystemet Ceph lämpade sig bra som en lösning utifrån kraven som sattes som mål för High Performance Storage och Virtual Machine Storage. Kostnadsuppskattningen visade att det var mer lönsamt att implementera ett öppet distribuerat filsystem. Denna undersökning kan användas som en vägledning vid val mellan olika distribuerade filsystem. / In this thesis, a study of three different uses cases has been made within the field of data storage, which are as following: Cold Storage, High Performance Storage and Virtual Machine Storage. The purpose of the survey is to give an overview of commercial distributed file systems and a deeper study of open source codes distributed file systems in order to find the most optimal solution for these use cases. Within the study, previous works concerning performance, data protection and costs were an-alyzed and compared in means to find different functionalities (snapshotting, multi-tenancy, data duplication and data replication) which distinguish modern distributed file systems. Both commercial and open distributed file systems were examined. A cost estimation for commercial and open distrib-uted file systems were made in means to find out the profitability for these two types of distributed file systems.After comparing and analyzing previous works, it was clear that the open source distributed file sys-tem Ceph was proper as a solution in accordance to the objectives that were set for High Performance Storage and Virtual Machine Storage. The cost estimation showed that it was more profitable to im-plement an open distributed file system. This study can be used as guidance to choose between different distributed file systems. Cold Storage High Performance Storage Virtual Machine Storage uses cases snapshotting multi-tenancy data deduplication data replication distributed file systems Cold Storage High Performance Storage Virtual Machine Storage användningsfall snapshotting multi-tenancy datadeduplicering datareplikering distribuerade filsystem Computer Systems Datorsystem Computer Engineering Datorteknik
13	Data Transfer and Management through the IKAROS framework : Adopting an asynchronous non-blocking event driven approach to implement the Elastic-Transfer's IMAP client-server connection Gkikas, Nikolaos January 2015 (has links) Given the current state of input/output (I/O) and storage devices in petascale systems, incremental solutions would be ineffective when implemented in exascale environments. According to the "The International Exascale Software Roadmap", by Dongarra, et al. existing I/O architectures are not sufficiently scalable, especially because current shared file systems have limitations when used in large-scale environments. These limitations are: Bandwidth does not scale economically to large-scale systems, I/O traffic on the high speed network can impact on and be influenced by other unrelated jobs, and I/O traffic on the storage server can impact on and be influenced by other unrelated jobs. Future applications on exascale computers will require I/O bandwidth proportional to their computational capabilities. To avoid these limitations C. Filippidis, C. Markou, and Y. Cotronis proposed the IKAROS framework. In this thesis project, the capabilities of the publicly available elastic-transfer (eT) module which was directly derived from the IKAROS, will be expanded. The eT uses Google’s Gmail service as an utility for efficient meta-data management. Gmail is based on the IMAP protocol, and the existing version of the eT framework implements the Internet Message Access Protocol (IMAP) client-server connection through the ‘‘Inbox’’ module from the Node Package Manager (NPM) of the Node.js programming language. This module was used as a proof of concept, but in a production environment this implementation undermines the system’s scalability and there is an inefficient allocation of the system’s resources when a large number of concurrent requests arrive at the eT′s meta-data server (MDS) at the same time. This thesis solves this problem by adopting an asynchronous non-blocking event driven approach to implement the IMAP client-server connection. This was done by integrating and modifying the ‘‘Imap’’ NPM module from the NPM repository to suit the eT framework. Additionally, since the JavaScript Object Notation (JSON) format has become one of the most widespread data-interchange formats, eT′s meta-data scheme is appropriately modified to make the system’s meta-data easily parsed as JSON objects. This feature creates a framework with wider compatibility and interoperability with external systems. The evaluation and operational behavior of the new module was tested through a set of data transfer experiments over a wide area network environment. These experiments were performed to ensure that the changes in the system’s architecture did not affected its performance. / Givet det nuvarande läget för input/output (I/O) och lagringsenheter för system i peta-skala, skulle inkrementella lösningar bli ineffektiva om de implementerades i exa-skalamiljöer. Enligt ”The International Exascale Software Roadmap”, av Dongarra et al., är nuvarande I/O-arkitekturer inte tillräckligt skalbara, särskilt eftersom nuvarande delade filsystem har begränsningar när de används i storskaliga miljöer. Dessa begränsningar är: Bandbredd skalar inte på ett ekonomiskt sätt i storskaliga system, I/O-trafik på höghastighetsnätverk kan ha påverkan på och blir påverkad av andra orelaterade jobb, och I/O-trafik på lagringsservern kan ha påverkan på och bli påverkad av andra orelaterade jobb. Framtida applikationer på exa-skaladatorer kommer kräva I/O-bandbredd proportionellt till deras beräkningskapacitet. För att undvika dessa begränsningar föreslog C. Filippidis, C. Markou och Y. Cotronis ramverket IKAROS. I detta examensarbete utökas funktionaliteten hos den publikt tillgängliga modulen elastic-transfer (eT) som framtagits utifrån IKAROS. Den befintliga versionen av eT-ramverket implementerar Internet Message Access Protocol (IMAP) klient-serverkommunikation genom modulen ”Inbox” från Node Package Manager (NPM) ur Node.js programmeringsspråk. Denna modul användes som ett koncepttest, men i en verklig miljö så underminerar denna implementation systemets skalbarhet när ett stort antal värdar ansluter till systemet. Varje klient begär individuellt information relaterad till systemets metadata från IMAP-servern, vilket leder till en ineffektiv allokering av systemets resurser när ett stort antal värdar är samtidigt anslutna till eT-ramverket. Denna uppsats löser problemet genom att använda ett asynkront, icke-blockerande och händelsedrivet tillvägagångssätt för att implementera en IMAP klient-serveranslutning. Detta görs genom att integrera och modifiera NPM:s ”Imap”-modul, tagen från NPM:s katalog, så att den passar eT-ramverket. Eftersom formatet JavaScript Object Notation (JSON) har blivit ett av de mest spridda formaten för datautbyte så modifieras även eT:s metadata-struktur för att göra systemets metadata enkelt att omvandla till JSON-objekt. Denna funktionalitet ger ett bredare kompatibilitet och interoperabilitet med externa system. Utvärdering och tester av den nya modulens operationella beteende utfördes genom en serie dataöverföringsexperiment i en wide area network-miljö. Dessa experiment genomfördes för att få bekräftat att förändringarna i systemets arkitektur inte påverkade dess prestanda. parallel file systems distributed file systems IKAROS file system elastic-transfer grid computing storage systems I/O limitations exascale low power consumption low cost devices synchronous blocking asynchronous non-blocking event-driven JSON. parallella filsystem distribuerade filsystem IKAROS filsystem elastic-transfer grid computing lagringssystem I/O-begränsningar exa-skala låg energiförbrukning lågkostnadsenheter synkron blockerande asynkron icke-blockerande händelsedriven JSON Communication Systems Kommunikationssystem

Page generated in 0.0795 seconds