Global ETD Search

1	Scheduling Tasks over Multicore machines enhanced with Accelerators : a Runtime System’s Perspective / Vers des supports exécutifs capables d'exploiter des machines multicors hétérogènes Augonnet, Cédric 09 December 2011 (has links) Bien que les accélérateurs fassent désormais partie intégrante du calcul haute performance, les gains observés ont un impact direct sur la programmabilité, de telle sorte qu'un support proposant des abstractions portables est indispensable pour tirer pleinement partie de toute la puissance de calcul disponible de manière portable, malgré la complexité de la machine sous-jacente. Dans cette thèse, nous proposons un modèle de support exécutif offrant une interface expressive permettant notamment de répondre aux défis soulevés en termes d'ordonnancement et de gestion de données. Nous montrons la pertinence de notre approche à l'aide de la plateforme StarPU conçue à l'occasion de cette thèse. / Multicore machines equipped with accelerators are becoming increasingly popular in the HighPerformance Computing ecosystem. Hybrid architectures provide significantly improved energyefficiency, so that they are likely to generalize in the Manycore era. However, the complexity introducedby these architectures has a direct impact on programmability, so that it is crucial toprovide portable abstractions in order to fully tap into the potential of these machines. Pure offloadingapproaches, that consist in running an application on regular processors while offloadingpredetermined parts of the code on accelerators, are not sufficient. The real challenge is to buildsystems where the application would be spread across the entire machine, that is, where computationwould be dynamically scheduled over the full set of available processing units.In this thesis, we thus propose a new task-based model of runtime system specifically designedto address the numerous challenges introduced by hybrid architectures, especially in terms of taskscheduling and of data management. In order to demonstrate the relevance of this model, we designedthe StarPU platform. It provides an expressive interface along with flexible task schedulingcapabilities tightly coupled to an efficient data management. Using these facilities, together witha database of auto-tuned per-task performance models, it for instance becomes straightforward todevelop efficient scheduling policies that take into account both computation and communicationcosts. We show that our task-based model is not only powerful enough to provide support forclusters, but also to scale on hybrid manycore architectures.We analyze the performance of our approach on both synthetic and real-life workloads, andshow that we obtain significant speedups and a very high efficiency on various types of multicoreplatforms enhanced with accelerators. Hpc Ordonnancement StarPU Accelérateur Support exécutif
2	Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme Bordage, Cyril 20 December 2013 (has links) La méthode multipôle permet d'accélérer les produits matrices-vecteurs, utilisés par les solveurs itératifs pour déterminer le comportement électromagnétique, d'un objet soumis à une onde incidente. Nos travaux ont pour but d'adapter cette méthode pour la rendre efficace sur les architectures hétérogènes contenant des GPU. Pour cela, nous utilisons une ordonnanceur dynamique, StarPU, qui effectuera la distribution des tâches de calcul au sein d'un nœud. Pour la parallélisation en mémoire distribuée, nous effectuerons un ordonnancement statique des boîtes, couplé à un ordonnancement dynamique des interactions proches. / The Fast Multipole Method can speed up matrix-vector products, found in iterative solvers in order to compute the electromagnetics response of an object subject to an incident wave. We have intended to adapt this method to make it effective on heterogeneous architectures with GPUs. For this purpose, we use a dynamic scheduler named StarPU, which distributes the tasks within a node. For the parallelization in distributed memory, we distribute the tasks statically but we distribute the near interactions dynamically.. Méthode multipôle Fmm Ordonnancement dynamique Électromagnétique Helmholtz Mpi StarPU Cuda Fast multipole method Fmm Dynamic scheduling Electromagnetics Helmholtz Mpi StarPU Cuda
3	Méthodes numériques pour les plasmas sur architectures multicoeurs / Numerical methods for plasmas on massively parallel architectures Massaro, Michel 16 December 2016 (has links) Cette thèse traite de la résolution du système de la Magnéto-Hydro-Dynamique (MHD) sur architectures massivement parallèles. Ce système est un système hyperbolique de lois de conservation. Pour des raisons de coût en termes de temps et d'espace, nous utilisons la méthode des volumes finis. Ces critères sont particulièrement importants dans le cas de la MHD, car les solutions obtenues peuvent présenter de nombreuses ondes de choc et être très turbulentes. L'approche d'un phénomène physique nécessite par conséquent de travailler sur un maillage fin entrainant une grande quantité de calcul. Afin de réduire les temps d'exécution des algorithmes proposés, nous proposons des méthodes d'optimisations pour l'exécution sur CPU telles que l'utilisation d'OpenMP pour une parallélisation automatique ou le parcours optimisé afin de bénéficier des effets de cache. Une implémentation sur architecture GPU à l'aide de la librairie OpenCL est également proposée. Dans le but de conserver une coalescence maximale des données en mémoire, nous proposons une méthode utilisant un splitting directionnel associé à une méthode de transposition optimisée pour les implémentations parallèle. Dans la dernière partie, nous présentons la librairie SCHNAPS. Ce solveur utilisant la méthode Galerkin Discontinu (GD) utilise des implémentations OpenCL et StarPU afin de profiter au maximum des avantages de la programmation hybride. / This thesis deals with the resolution of the Magneto-Hydro-Dynamic (MHD) system on massively parallel architectures. This problem is an hyperbolic system of conservation laws. For cost reasons in terms of time and space, we use the finite volume method. These criteria are particularly important in the case of MHD because the solutions obtained may have many shock waves and be very turbulent. The approach of a physical phenomenon requires working on a fine mesh which involves a large quantity of computations. In order to reduce the execution time of the proposed algorithms, we present several optimization methods for CPU execution such as the use of OpenMP for an automatic parallelization or an optimized way to browse a grid in order to benefit from cache effects. An implementation on GPU architecture using the OpenCL library is also available. To maintain a maximal coalescence of the data in memory, we propose a method using a directional splitting associated with an optimized transposition method for parallel implementations. In the last part, we present the SCHNAPS library. This solver using the Galerkin Disontinu (GD) method uses OpenCL and StarPU implementations in order to maximize the benefits of hybrid programming. Magnéto-Hydro-Dynamique Volumes finis Galerkin Discontinu Parallélisation OpenCL StarPU Magneto-Hydro-Dynamic Finite volume Discontinuous Galerkin Parallelization OpenCL StarPU 005.4 518 538.6
4	Using Task Parallelism for Distributed Parallel Skeleton Programming : Implementing a StarPU Back-End to SkePU 2 / Distribuerade parallellprogrammeringsskelett genom uppgiftsparallellism : Implementation av en StarPU-baserad SkePU 2 backend Henrik, Henriksson January 2024 (has links) We extended the parallel skeleton programming framework SkePU 2 with a new back-end utilizing StarPU, a task programming framework for hybrid and distributed architectures. The aim was to allow SkePU to run on distributed clusters, using MPI through StarPU. The implemented back-end distributes data and work across participating ranks. While we did not implement the full SkePU API, the Map and Reduce1D skeletons were successfully implemented. During the implementation, we discovered some differences in API design between SkePU and StarPU. We combine the type-safe templates used in the SkePU API with the C-style void*-heavy API of StarPU. This requires the implementation to use more complex templates than normally desired. While we could preserve most of the SkePU 2 API when moving to a distributed memory situation, some parts had to change. In particular, we needed to change the semantics of SkePU 2 containers with regards to iterators and random access. We benchmarked the performance of the implemented back-end against an MPI+OpenMP reference implementation on two problems, n-body and a simple reduction. While the n-body problem demonstrates promising scaling properties, reductions do not scale well to larger number of ranks. A performance comparison against the MPI+OpenMP reference implementation reveals that, aside from the higher communication overhead, there may also be some overhead in the work performed between communications, potentially performing at below 60-70% of the reference. In most cases, the new back-end to SkePU exhibits significantly lower performance than the reference. Extending the implemented solution to cover the full API and improving performance could provide a high level interface to distributed programming for application programmers. Indeed, subsequent developments of SkePU 3 extend and improve our StarPU back-end. HPC StarPU SkePU parallel porgramming skeleton programming distributed systems MPI Computer Engineering Datorteknik
5	Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme Bordage, Cyril 20 December 2013 (has links) (PDF) La méthode multipôle permet d'accélérer les produits matrices-vecteurs, utilisés par les solveurs itératifs pour déterminer le comportement électromagnétique, d'un objet soumis à une onde incidente. Nos travaux ont pour but d'adapter cette méthode pour la rendre efficace sur les architectures hétérogènes contenant des GPU. Pour cela, nous utilisons une ordonnanceur dynamique, StarPU, qui effectuera la distribution des tâches de calcul au sein d'un nœud. Pour la parallélisation en mémoire distribuée, nous effectuerons un ordonnancement statique des boîtes, couplé à un ordonnancement dynamique des interactions proches. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Méthode multipôle Fmm Ordonnancement dynamique Électromagnétique Helmholtz Mpi StarPU Cuda
6	Optimisation de code Galerkin discontinu sur ordinateur hybride : application à la simulation numérique en électromagnétisme / Discontinuous Galerkin code optimization on hybrid computer : application to the numerical simulation in electromagnetism Weber, Bruno 26 November 2018 (has links) Nous présentons dans cette thèse les évolutions apportées au solveur Galerkin Discontinu Teta-CLAC, issu de la collaboration IRMA-AxesSim, au cours du projet HOROCH (2015-2018). Ce solveur permet de résoudre les équations de Maxwell en 3D, en parallèle sur un grand nombre d'accélérateurs OpenCL. L'objectif du projet HOROCH était d'effectuer des simulations de grande envergure sur un modèle numérique complet de corps humain. Ce modèle comporte 24 millions de mailles hexaédriques pour des calculs dans la bande de fréquences des objets connectés allant de 1 à 3 GHz (Bluetooth). Les applications sont nombreuses : téléphonie et accessoires, sport (maillots connectés), médecine (sondes : gélules, patchs), etc. Les évolutions ainsi apportées comprennent, entre autres : l'optimisation des kernels OpenCL à destination des CPU dans le but d'utiliser au mieux les architectures hybrides ; l'expérimentation du runtime StarPU ; le design d'un schéma d'intégration à pas de temps local ; et bon nombre d'optimisations permettant au solveur de traiter des simulations de plusieurs millions de mailles. / In this thesis, we present the evolutions made to the Discontinuous Galerkin solver Teta-CLAC – resulting from the IRMA-AxesSim collaboration – during the HOROCH project (2015-2018). This solver allows to solve the Maxwell equations in 3D and in parallel on a large amount of OpenCL accelerators. The goal of the HOROCH project was to perform large-scale simulations on a complete digital human body model. This model is composed of 24 million hexahedral cells in order to perform calculations in the frequency band of connected objects going from 1 to 3 GHz (Bluetooth). The applications are numerous: telephony and accessories, sport (connected shirts), medicine (probes: capsules, patches), etc. The changes thus made include, among others: optimization of OpenCL kernels for CPUs in order to make the best use of hybrid architectures; StarPU runtime experimentation; the design of an integration scheme using local time steps; and many optimizations allowing the solver to process simulations of several millions of cells. Solveur, Maxwell Électromagnétisme Système hyperbolique Galerkin Discontinu GD GDTD Maillage Hexaèdres GPU CPU OpenCL MPI StarPU Pas de temps local Ordre spatial adaptatif Modèle de corps humain complet Objets connectés Bluetooth Solver, Maxwell Electromagnetism Hyperbolic system Discontinuous Galerkin DG DGTD Mesh Hexahedrons GPU CPU OpenCL MPI StarPU Local time step Adaptive spatial order Complete human body model Connected objects Bluetooth 005.4 621.38

1

Page generated in 0.0268 seconds