Global ETD Search

1	Implementation of the HadoopMapReduce algorithm on virtualizedshared storage systems Nethula, Shravya January 2016 (has links) Context Hadoop is an open-source software framework developed for distributed storage and distributed processing of large sets of data. The implementation of the Hadoop MapReduce algorithm on virtualized shared storage by eliminating the concept of Hadoop Distributed File System (HDFS) is a challenging task. In this study, the Hadoop MapReduce algorithm is implemented on the Compuverde software that deals with virtualized shared storage of data. Objectives In this study, the effect of using virtualized shared storage with Hadoop framework is identified. The main objective of this study is to design a method to implement the Hadoop MapReduce algorithm on Compuverde software that deals with virtualized shared storage of big data. Finally, the performance of the MapReduce algorithm on Compuverde shared storage (Compuverde File System - CVFS) is evaluated and compared to the performance of the MapReduce algorithm on HDFS. Methods Initially a literature study is conducted to identify the effect of Hadoop implementation on virtualized shared storage. The Compuverde software is analyzed in detail during this literature study. The concepts of the MapReduce algorithms and the functioning of HDFS are scrutinized in detail. The next main research method that is adapted for this study is the implementation of a method where the Hadoop MapReduce algorithm is applied on the Compuverde software that deals with the virtualized shared storage by eliminating the HDFS. The next step is experimentation in which the performance of the implementation of the MapReduce algorithm on Compuverde shared storage (CVFS) in comparison with implementation of the MapReduce algorithm on Hadoop Distributed File System. Results The experiment is conducted in two different scenarios namely the CPU bound scenario and I/O bound scenario. In CPU bound scenario, the average execution time of WordCount program has a linear growth with respect to size of data set. This linear growth is observed for both the file systems, HDFS and CVFS. The same is the case with I/O bound scenario. There is linear growth for both the file systems. When the averages of execution time are plotted on the graph, both the file systems perform similarly in CPU bound scenario(multi-node environment). In the I/O bound scenario (multi-node environment), HDFS slightly out performs CVFS when the size of 1.0GB and both the file systems performs without much difference when the size of data set is 0.5GB and 1.5GB. Conclusions The MapReduce algorithm can be implemented on live data present in the virtualized shared storage systems without copying data into HDFS. In single node environment, distributed storage systems perform better than shared storage systems. In multi-node environment, when the CPU bound scenario is considered, both HDFS and CVFS file systems perform similarly. On the other hand, HDFS performs slightly better than CVFS for 1.0GB of data set in the I/O bound scenario. Hence we can conclude that distributed storage systems perform similar to the shared storage systems in both CPU bound and I/O bound scenarios in multi-node environment. Hadoop virtualized systems shared storage MapReduce Hadoop Distributed File System Computer Sciences Datavetenskap (datalogi)
2	Maîtrise énergétique des centres de données virtualisés : D'un scénario de charge à l'optimisation du placement des calculs / Power management in virtualized data centers : Form a load scenario to the optimization of the tasks placement Le Louët, Guillaume 12 May 2014 (has links) Cette thèse se place dans le contexte de l’hébergement de services informatiques virtualisés et apporte deux contributions. Elle propose premièrement un système d’aide à la gestion modulaire, déplaçant les machines virtuelles du centre pour le maintenir dans un état satisfaisant. Ce système permet en particulier d’intégrer la notion de consommation électrique des serveurs ainsi que des règles propres à cette consommation. Sa modularité permet de plus l’adaptation de ses composants à des problèmes de grande taille. Cette thèse propose de plus un outil pour comparer différents gestionnaires de centres virtualisés. Cet outil injecte un scénario de montée en charge reproductible dans une infrastructure virtualisée. L’injection d’un tel scénario permet d’évaluer les performances du système de gestion du centre grâce à des sondes spécifiques. Le langage utilisé pour cette injection est extensible et permet l’utilisation de scénarios paramétrés. / This thesis considers the virtualized IT services hosting and makes two contributions. It first proposes a modular system of management aids, to move the virtual machines of the center in order to keep it in a good condition. This system allows in particular to integrate the concept of server power consumption and rules specific to that concept. What’s more, its modularity allows to adjust its components to handle larger problems. This thesis proposes also a tool to compare different virtualized centers managers. This tool injects a reproductible load increase scenario in a virtualized infrastructure. The injection of such a scenario is used to evaluate the performance of the system center manager, using performances probes. The language used for this injection is extensible and allows the creation of parameterized scenarios. The contributions of this thesis were presented in two international conferences and a french conference. Placement des machines virtuelles Gestionnaire de centre de données Programmation par contrainte Injection de charge Scénario d’activités Performances des systèmes virtualisés Virtual machine placement Data center manager Constraint programing Load injection Activity scenario Performances of virtualized systems

Search results

Implementation of the HadoopMapReduce algorithm on virtualizedshared storage systems

Maîtrise énergétique des centres de données virtualisés : D'un scénario de charge à l'optimisation du placement des calculs / Power management in virtualized data centers : Form a load scenario to the optimization of the tasks placement