Global ETD Search

Return to search

Contribution to High Performance Computing and Big Data Infrastructure Convergence / Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

La quantité de données produites dans le monde scientifique comme dans le monde commercial, est en constante augmentation. Le domaine du traitement de donnée à large échelle, appelé “Big Data”, a été inventé pour traiter des données sur de larges infrastructures informatiques distribuées. Mais l’intégration de système Big Data sur des machines de calcul intensif pose de nombreux problèmes. En effet, les gestionnaires de ressources ainsi que les systèmes de fichier de super calculateurs ne sont pas penser pour ce type de travail. Le sujet de cette thèse est de trouver la meilleure approche pour faire interagir ces deux gestionnaires de ressources et de traiter les différents problèmes soulevés par les mouvements de données et leur ordonnancement. / The amount of data produced, either in the scientific community and the commercial world, is constantly growing. The field of Big Data has emerged to handle a large amount of data on distributed computing infrastructures. High-Performance Computer (HPC) infrastructures are made for intensive parallel computations. The HPC community is also facing more and more data because of new high definition sensors and large physics apparatus. The convergence of the two fields is currently happening. In fact, the HPC community is already using Big Data tools, but they are not integrated correctly, especially at the level of the file system and the Resources and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, and what are the challenges for the HPC infrastructures, we have studied multiple aspects of the convergence: we have made a survey on the software provisioning methods, with a focus on data-intensive applications. We also propose a new RJMS collaboration technique called BeBiDa which is based on 50 lines of code whereas similar solutions use at least 1000x more. We evaluate this mechanismon real conditions and in a simulation with our simulator Batsim.

http://www.theses.fr/2019GREAM031/document

Super calculateur

Gestion de données

Gestion de ressources

Infrastructure Informatique

Convergence

Simulation

High performance computing

Identifer	oai:union.ndltd.org:theses.fr/2019GREAM031
Date	01 July 2019
Creators	Mercier, Michael
Contributors	Grenoble Alpes, Raffin, Bruno, Richard, Olivier
Source Sets	Dépôt national des thèses électroniques françaises
Language	English
Detected Language	English
Type	Electronic Thesis or Dissertation, Text

Page generated in 0.0024 seconds

Contribution to High Performance Computing and Big Data Infrastructure Convergence / Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

Description

Links & Downloads

Tags

Additional Fields