Spelling suggestions: "subject:"performancecomputing"" "subject:"performance.comparing""
311 |
Unified system of code transformation and execution for heterogeneous multi-core architectures. / Système unifié de transformation de code et d'éxécution pour un passage aux architectures multi-coeurs hétérogènesLi, Pei 17 December 2015 (has links)
Architectures hétérogènes sont largement utilisées dans le domaine de calcul haute performance. Cependant, le développement d'applications sur des architectures hétérogènes est indéniablement fastidieuse et sujette à erreur pour un programmeur même expérimenté. Pour passer une application aux architectures multi-cœurs hétérogènes, les développeurs doivent décomposer les données de l'entrée, gérer les échanges de valeur intermédiaire au moment d’exécution et garantir l'équilibre de charge de système. L'objectif de cette thèse est de proposer une solution de programmation parallèle pour les programmeurs novices, qui permet de faciliter le processus de codage et garantir la qualité de code. Nous avons comparé et analysé les défauts de solutions existantes, puis nous proposons un nouvel outil de programmation STEPOCL avec un nouveau langage de domaine spécifique qui est conçu pour simplifier la programmation sur les architectures hétérogènes. Nous avons évalué la performance de STEPOCL sur trois cas d'application classiques : un stencil 2D, une multiplication de matrices et un problème à N corps. Le résultat montre que : (i) avec l'aide de STEPOCL, la performance d'application varie linéairement selon le nombre d'accélérateurs, (ii) la performance de code généré par STEPOCL est comparable à celle de la version manuscrite. (iii) les charges de travail, qui sont trop grandes pour la mémoire d'un seul accélérateur, peuvent être exécutées en utilisant plusieurs accélérateurs. (iv) grâce à STEPOCL, le nombre de lignes de code manuscrite est considérablement réduit. / Heterogeneous architectures have been widely used in the domain of high performance computing. However developing applications on heterogeneous architectures is time consuming and error-prone because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. The aim of this thesis is to propose a solution of parallel programming for novice developers, to ease the complex coding process and guarantee the quality of code. We lighted and analysed the shortcomings of existing solutions and proposed a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for heterogeneous architectures. We evaluated both the performance and the usefulness of STEPOCL. The result show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with an handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
|
312 |
Parallel Generation of Tetrahedral Meshes with Cracks by Spatial Binary Decomposition / GeraÃÃo em Paralelo de Malhas TetraÃdricas com Fraturas por DecomposiÃÃo Espacial BinÃriaMarkos Oliveira Freitas 08 May 2015 (has links)
CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior / This work describes a technique for generating three-dimensional tetrahedral meshes using parallel computing,
with shared, distributed, or hybrid memory processors. The input for the algorithm is a triangular mesh that models the surface of one of several objects, that might have holes in its interior or internal or boundary cracks. A binary tree structure for spatial partitioning is proposed in this work to recursively decompose the domain in as many subdomains as processes or threads in the parallel system, in which every subdomain has the geometry of a rectangular parallelepiped. This decomposition attempts to balance the amount of work in all the subdomains. The amount of work, known as load, of any mesh generator is usually given as a function of its output size, i.e., the size of the generated mesh. Therefore, a technique to estimate the size of this mesh, the total load of the domain, is needed beforehand. This
work uses a refined octree, generated from the surface mesh, to estimate this load, and the decomposition is performed on top of this octree. Once the domain is decomposed, each process/thread generates the mesh in its subdomain by means of an advancing front technique, in such a way that it does not overpass the limits defined by its subdomain, and applies an improvement on it. Some of the processes/threads are responsible for generating the meshes connecting the subdomains, i.e., the interface meshes, in order to generate the whole mesh. This technique presented good speed-up results, keeping the quality of the mesh comparable to the quality of the serially generated mesh. / Este trabalho descreve uma tÃcnica para gerar malhas tridimensionais tetraÃdricas utilizando computaÃÃo paralela, com processadores de memÃria compartilhada, memÃria distribuÃda ou memÃria hÃbrida. A entrada para o algoritmo à uma malha triangular que modela a superfÃcie de um ou vÃrios objetos, que podem conter buracos no interior ou fraturas internas ou na borda. Uma estrutura em forma de Ãrvore binÃria de partiÃÃo espacial à proposta neste trabalho para, recursivamente, decompor o domÃnio em tantos subdomÃnios quantos forem os processos ou threads no sistema paralelo, em que cada subdomÃnio tem a geometria de um paralelepÃpedo retangular. Esta decomposiÃÃo tenta equilibrar a quantidade de trabalho em todos os subdomÃnios. A quantidade de trabalho, conhecida como carga, de qualquer gerador de malha à geralmente dada em funÃÃo do tamanho da saÃda do algoritmo, ou seja, do tamanho da malha gerada. Assim, faz-se necessÃria uma tÃcnica para estimar previamente o tamanho dessa malha, que à carga total do domÃnio. Este trabalho faz uso de uma octree refinada, gerada a partir da malha de superfÃcie dada como entrada, para estimar esta carga, e a decomposiÃÃo à feita a partir dessa octree. Uma vez decomposto o domÃnio, cada processo/thread gera a malha em seu subdomÃnio por uma tÃcnica de avanÃo de fronteira, de forma que ela nÃo ultrapasse os limites definidos pelo seu subdomÃnio, e aplica um melhoramento nela. Alguns dos processos/threads ficam responsÃveis por gerar as malhas conectando os subdomÃnios, ou seja, as malhas de interface, atà que toda a malha tenha sido gerada. Esta tÃcnica apresentou bons resultados de speed-up, mantendo a qualidade da malha comparÃvel à qualidade da malha gerada sequencialmente.
|
313 |
Metodologia para execução de aplicações paralelas baseadas no modelo BSP com tarefas heterogêneas. / Methodology for parallel application execution based on BSP model with heterogeneous tasks.Fernando Henrique e Paula da Luz 21 September 2015 (has links)
A computação paralela permite uma série de vantagens para a execução de aplicações de grande porte, sendo que o uso efetivo dos recursos computacionais paralelos é um aspecto relevante da computação de alto desempenho. Este trabalho apresenta uma metodologia que provê a execução, de forma automatizada, de aplicações paralelas baseadas no modelo BSP com tarefas heterogêneas. É considerado no modelo adotado, que o tempo de computação de cada tarefa secundária não possui uma alta variância entre uma iteração e outra. A metodologia é denominada de ASE e é composta por três etapas: Aquisição (Acquisition), Escalonamento (Scheduling) e Execução (Execution). Na etapa de Aquisição, os tempos de processamento das tarefas são obtidos; na etapa de Escalonamento a metodologia busca encontrar a distribuição de tarefas que maximize a velocidade de execução da aplicação paralela, mas minimizando o uso de recursos, por meio de um algoritmo desenvolvido neste trabalho; e por fim a etapa de Execução executa a aplicação paralela com a distribuição definida na etapa anterior. Ferramentas que são aplicadas na metodologia foram implementadas. Um conjunto de testes aplicando a metodologia foi realizado e os resultados apresentados mostram que os objetivos da proposta foram alcançados. / Parallel computing allows for a series of advantages on the execution of large applications and the effective use of parallel resources is an important aspect in the High Performance Computing. This work presents a methodology to provide the execution, in an automated way, of parallel applications based on BSP model with heterogeneous tasks. In this model it is assumed that the computation time between iterations does not have a high variance. The methodology is entitled ASE and it is composed by three stages: Acquisition, Scheduling and Execution. In the Acquisition step, the tasks\' processing time are obtained; In the Scheduling step, the methodology finds the ideal arrangement to distribute the tasks to maximize the execution speed and, simultaneously, minimize the use of resources. This is made using an algorithm developed in this work; and lastly the Execution step, where the parallel application is executed in the distribution defined in the previous step. The tools used in the methodology were implemented. A set of tests to apply the methodology were made and the results shown that the objectives were reached.
|
314 |
Um ambiente de programação e processamento de aplicações paralelas para grades computacionais. / A programming and prrocessing environment of parallel applications to grid computing.Augusto Mendes Gomes Júnior 28 November 2011 (has links)
A execução de uma aplicação paralela, utilizando grades computacionais, necessita de um ambiente que permita a sua execução, além de realizar o seu gerenciamento, escalonamento e monitoramento. O ambiente de execução deve prover um modelo de processamento, composto pelos modelos de programação e de execução, no qual o objetivo é a exploração adequada das características das grades computacionais. Este trabalho objetiva a proposta de um modelo de processamento paralelo, baseado em variáveis compartilhadas, para grades computacionais, sendo composto por um modelo de execução apropriado para grades e pelo modelo de programação da linguagem paralela CPAR. O ambiente CPAR-Grid foi desenvolvido para executar aplicações paralelas em grades computacionais, abstraindo do usuário todas as características presentes em uma grade computacional. Os resultados obtidos mostram que este ambiente é uma solução eficiente para a execução de aplicações paralelas. / The execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. The execution environment must provide a processing model, consisting of programming and execution models, with the objective appropriately exploiting grid computing characteristics. This paper proposes a parallel processing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallel language programming model. The CPAR-Grid environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. The results show that this environment is an efficient solution for the execution of parallel applications.
|
315 |
The management of multiple submissions in parallel systems : the fair scheduling approach / La gestion de plusieurs soumissions dans les systèmes parallèles : l'approche d'ordonnancement équitableGama Pinheiro, Vinicius 14 February 2014 (has links)
Le problème étudié est celui de l'ordonnancement d'applications dans lessystèmes parallèles et distribués avec plusieurs utilisateurs. Les nouvellesplates-formes de calcul parallèle et distribué offrent des puissances trèsgrandes qui permettent d'envisager la résolution d'applications complexesinteractives. Aujourd'hui, il reste encore difficile d'utiliser efficacementcette puissance par manque d'outils de gestion de ressources. Le travaileffectué dans cette thèse se place dans cette perspective d'analyser etdévelopper des algorithmes efficaces pour gérer efficacement des ressources decalcul partagées entre plusieurs utilisateurs. On analyse les scénarios avecplusieurs soumissions lancées par multiples utilisateurs au cours du temps. Cessoumissions ont un ou plus de processus et l'ensemble de soumissions estorganisé en successifs campagnes. Les processus d'une seule campagnesont séquentiels et indépendants, mais les processus d'une campagne ne peuventpas commencer leur exécution avant que tous les processus provenant de ladernière campagne sont completés. Chaque utilisateur est intéressé à minimiserla somme des temps de réponses des campagnes. On définit un modèle théorique pour l'ordonnancement des campagnes et on montreque, dans le cas général, c'est NP-difficile. Pour le cas avec un utilisateur,on démontre qu'un algorithme d'ordonnancement $ho$-approximation pour le(classique) problème d'ordonnancement de tâches parallèles est aussi un$ho$-approximation pour le problème d'ordonnancement de campagnes. Pour lecas général avec $k$ utilisateurs, on établis un critère de emph{fairness}inspiré par partage de temps. On propose FairCamp, un algorithmed'ordonnancement qu'utilise dates limite pour réaliser emph{fairness} parmiles utilisateurs entre consécutifes campagnes. On prouve que FairCamp augmentele temps de réponse de chaque utilisateur par a facteur maximum de $kho$ parrapport un processeur dédiée à l'utilisateur. On prouve aussi que FairCamp estun algorithme $ho$-approximation pour le maximum emph{stretch}.On compare FairCamp contre emph{First-Come-First-Served} (FCFS) parsimulation. On démontre que, comparativement à FCFS, FairCamp réduit le maximal{em stretch} a la limite de $3.4$ fois. La différence est significative dansles systèmes utilisé pour plusieurs ($k>5$) utilisateurs.Les résultats montrent que, plutôt que juste des tâches individuelle etindépendants, campagnes de tâches peuvent être manipulées d'une manièreefficace et équitable. / We study the problem of scheduling in parallel and distributedsystems with multiple users. New platforms for parallel and distributedcomputing offers very large power which allows to contemplate the resolution ofcomplex interactive applications. Nowadays, it is still difficult to use thispower efficiently due to lack of resource management tools. The work done inthis thesis lies in this context: to analyse and develop efficient algorithmsfor manage computing resources shared among multiple users. We analyzescenarios with many submissions issued from multiple users over time. Thesesubmissions contain one or more jobs and the set of submissions are organizedin successive campaigns. Any job from a campaign can not start until allthe jobs from the previous campaign are completed. Each user is interested inminimizing the sum of flow times of the campaigns.In the first part of this work, we define a theoretical model for Campaign Scheduling under restrictive assumptions andwe show that, in the general case, it is NP-hard. For the single-user case, we show that an$ho$-approximation scheduling algorithm for the (classic) parallel jobscheduling problem is also an $ho$-approximation for the Campaign Schedulingproblem. For the general case with $k$ users, we establish a fairness criteriainspired by time sharing. Then, we propose FairCamp, a scheduling algorithm whichuses campaign deadlines to achieve fairness among users between consecutivecampaigns. We prove that FairCamp increases the flow time of each user by afactor of at most $kho$ compared with a machine dedicated to the user. Wealso prove that FairCamp is an $ho$-approximation algorithm for the maximumstretch.We compare FairCamp to {em First-Come-First-Served} (FCFS) by simulation. We showthat, compared with FCFS, FairCamp reduces the maximum stretch by up to $3.4$times. The difference is significant in systems used by many ($k>5$) users.Our results show that, rather than just individual, independent jobs, campaignsof jobs can be handled by the scheduler efficiently and fairly.
|
316 |
Um ambiente de programação e processamento de aplicações paralelas para grades computacionais. / A programming and prrocessing environment of parallel applications to grid computing.Gomes Júnior, Augusto Mendes 28 November 2011 (has links)
A execução de uma aplicação paralela, utilizando grades computacionais, necessita de um ambiente que permita a sua execução, além de realizar o seu gerenciamento, escalonamento e monitoramento. O ambiente de execução deve prover um modelo de processamento, composto pelos modelos de programação e de execução, no qual o objetivo é a exploração adequada das características das grades computacionais. Este trabalho objetiva a proposta de um modelo de processamento paralelo, baseado em variáveis compartilhadas, para grades computacionais, sendo composto por um modelo de execução apropriado para grades e pelo modelo de programação da linguagem paralela CPAR. O ambiente CPAR-Grid foi desenvolvido para executar aplicações paralelas em grades computacionais, abstraindo do usuário todas as características presentes em uma grade computacional. Os resultados obtidos mostram que este ambiente é uma solução eficiente para a execução de aplicações paralelas. / The execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. The execution environment must provide a processing model, consisting of programming and execution models, with the objective appropriately exploiting grid computing characteristics. This paper proposes a parallel processing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallel language programming model. The CPAR-Grid environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. The results show that this environment is an efficient solution for the execution of parallel applications.
|
317 |
Applications, performance analysis, and optimization of weather and air quality modelsSobhani, Negin 01 December 2017 (has links)
Atmospheric particulate matter (PM) is linked to various adverse environmental and health impacts. PM in the atmosphere reduces visibility, alters precipitation patterns by acting as cloud condensation nuclei (CCN), and changes the Earth’s radiative balance by absorbing or scattering solar radiation in the atmosphere. The long-range transport of pollutants leads to increase in PM concentrations even in remote locations such as polar regions and mountain ranges. One significant effect of PM on the earth’s climate occurs while light absorbing PM, such as Black Carbon (BC), deposits over snow. In the Arctic, BC deposition on highly reflective surfaces (e.g. glaciers and sea ices) has very intense effects, causing snow to melt more quickly. Thus, characterizing PM sources, identifying long-range transport pathways, and quantifying the climate impacts of PM are crucial in order to inform emission abatement policies for reducing both health and environmental impacts of PM.
Chemical transport models provide mathematical tools for better understanding atmospheric system including chemical and particle transport, pollution diffusion, and deposition. The technological and computational advances in the past decades allow higher resolution air quality and weather forecast simulations with more accurate representations of physical and chemical mechanisms of the atmosphere.
Due to the significant role of air pollutants on public health and environment, several countries and cities perform air quality forecasts for warning the population about the future air pollution events and taking local preventive measures such as traffic regulations to minimize the impacts of the forecasted episode. However, the costs associated with the complex air quality forecast models especially for simulations with higher resolution simulations make “forecasting” a challenge. This dissertation also focuses on applications, performance analysis, and optimization of meteorology and air quality modeling forecasting models.
This dissertation presents several modeling studies with various scales to better understand transport of aerosols from different geographical sources and economic sectors (i.e. transportation, residential, industry, biomass burning, and power) and quantify their climate impacts. The simulations are evaluated using various observations including ground site measurements, field campaigns, and satellite data.
The sector-based modeling studies elucidated the importance of various economical sector and geographical regions on global air quality and the climatic impacts associated with BC. This dissertation provides the policy makers with some implications to inform emission mitigation policies in order to target source sectors and regions with highest impacts. Furthermore, advances were made to better understand the impacts of light absorbing particles on climate and surface albedo.
Finally, for improving the modeling speed, the performances of the models are analyzed, and optimizations were proposed for improving the computational efficiencies of the models. Theses optimizations show a significant improvement in the performance of Weather Research and Forecasting (WRF) and WRF-Chem models. The modified codes were validated and incorporated back into the WRF source code to benefit all WRF users. Although weather and air quality models are shown to be an excellent means for forecasting applications both for local and hemispheric scale, further studies are needed to optimize the models and improve the performance of the simulations.
|
318 |
Coupled computational fluid dynamics/multibody dynamics method with application to wind turbine simulationsLi, Yuwei 01 May 2014 (has links)
A high fidelity approach coupling the computational fluid dynamics method (CFD) and multi-body dynamics method (MBD) is presented for aero-servo-elastic wind turbine simulations. The approach uses the incompressible CFD dynamic overset code CFDShip-Iowa v4.5 to compute the aerodynamics, coupled with the MBD code Virtual.Lab Motion to predict the motion responses to the aerodynamic loads. The IEC 61400-1 ed. 3 recommended Mann wind turbulence model was implemented in this thesis into the code CFDShip-Iowa v4.5 as boundary and initial conditions, and used as the explicit wind turbulence for CFD simulations. A drivetrain model with control systems was implemented in the CFD/MBD framework for investigation of drivetrain dynamics. The tool and methodology developed in this thesis are unique, being the first time with complete wind turbine simulations including CFD of the rotor/tower aerodynamics, elastic blades, gearbox dynamics and feedback control systems in turbulent winds.
Dynamic overset CFD simulations were performed with the benchmark experiment UAE phase VI to demonstrate capabilities of the code for wind turbine aerodynamics. The complete turbine geometry was modeled, including blades and approximate geometries for hub, nacelle and tower. Unsteady Reynolds-Averaged Navier-Stokes (URANS) and Detached Eddy Simulation (DES) turbulence models were used in the simulations. Results for both variable wind speed at constant blade pitch angle and variable blade pitch angle at fixed wind speed show that the CFD predictions match the experimental data consistently well, including the general trends for power and thrust, sectional normal force coefficients and pressure coefficients at different sections along the blade.
The implemented Mann wind turbulence model was validated both theoretically and statistically by comparing the generated stationary wind turbulent field with the theoretical one-point spectrum for the three components of the velocity fluctuations, and by comparing the expected statistics from the simulated turbulent field by CFD with the explicit wind turbulence inlet boundary from the Mann model.
The proposed coupled CFD/MBD approach was applied to the conceptual NREL 5MW offshore wind turbine. Extensive simulations were performed in an increasing level of complexity to investigate the aerodynamic predictions, turbine performance, elastic blades, wind shear and atmospheric wind turbulence. Comparisons against the publicly available OC3 simulation results show good agreements between the CFD/MBD approach and the OC3 participants in time and frequency domains. Wind turbulence/turbine interaction was examined for the wake flow to analyze the influence of turbulent wind on wake diffusion.
The Gearbox Reliability Collaborative project gearbox was up-scaled in size and added to the NREL 5MW turbine with the purpose of demonstrating drivetrain dynamics. Generator torque and blade pitch controllers were implemented to simulate realistic operational conditions of commercial wind turbines. Interactions between wind turbulence, rotor aerodynamics, elastic blades, drivetrain dynamics at the gear-level and servo-control dynamics were studied, showing the potential of the methodology to study complex aerodynamic/mechanic systems.
|
319 |
Energy Demand Response for High-Performance Computing SystemsAhmed, Kishwar 22 March 2018 (has links)
The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply.
This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system's demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems' energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system's emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system's energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response.
|
320 |
Développement d'un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques / Integration of High-Performance Task-Based In Situ for Molecular Dynamics on Exascale ComputersDirand, Estelle 06 November 2018 (has links)
L’ère de l’exascale creusera encore plus l’écart entre la vitesse de génération des données de simulations et la vitesse d’écriture et de lecture pour analyser ces données en post-traitement. Le temps jusqu’à la découverte scientifique sera donc grandement impacté et de nouvelles techniques de traitement des données doivent être mises en place. Les méthodes in situ réduisent le besoin d’écrire des données en les analysant directement là où elles sont produites. Il existe plusieurs techniques, en exécutant les analyses sur les mêmes nœuds de calcul que la simulation (in situ), en utilisant des nœuds dédiés (in transit) ou en combinant les deux approches (hybride). La plupart des méthodes in situ traditionnelles ciblent les simulations qui ne sont pas capables de tirer profit du nombre croissant de cœurs par processeur mais elles n’ont pas été conçues pour les architectures many-cœurs qui émergent actuellement. La programmation à base de tâches est quant à elle en train de devenir un standard pour ces architectures mais peu de techniques in situ à base de tâches ont été développées.Cette thèse propose d’étudier l’intégration d’un système in situ à base de tâches pour un code de dynamique moléculaire conçu pour les supercalculateurs exaflopiques. Nous tirons profit des propriétés de composabilité de la programmation à base de tâches pour implanter l’architecture hybride TINS. Les workflows d’analyses sont représentés par des graphes de tâches qui peuvent à leur tour générer des tâches pour une exécution in situ ou in transit. L’exécution in situ est rendue possible grâce à une méthode innovante de helper core dynamique qui s’appuie sur le concept de vol de tâches pour entrelacer efficacement tâches de simulation et d’analyse avec un faible impact sur le temps de la simulation.TINS utilise l’ordonnanceur de vol de tâches d’Intel® TBB et est intégré dans ExaStamp, un code de dynamique moléculaire. De nombreuses expériences ont montrées que TINS est jusqu’à 40% plus rapide que des méthodes existantes de l’état de l’art. Des simulations de dynamique moléculaire sur des système de 2 milliards de particles sur 14,336 cœurs ont montré que TINS est capable d’exécuter des analyses complexes à haute fréquence avec un surcoût inférieur à 10%. / The exascale era will widen the gap between data generation rate and the time to manage their output and analysis in a post-processing way, dramatically increasing the end-to-end time to scientific discovery and calling for a shift toward new data processing methods. The in situ paradigm proposes to analyze data while still resident in the supercomputer memory to reduce the need for data storage. Several techniques already exist, by executing simulation and analytics on the same nodes (in situ), by using dedicated nodes (in transit) or by combining the two approaches (hybrid). Most of the in situ techniques target simulations that are not able to fully benefit from the ever growing number of cores per processor but they are not designed for the emerging manycore processors.Task-based programming models on the other side are expected to become a standard for these architectures but few task-based in situ techniques have been developed so far. This thesis proposes to study the design and integration of a novel task-based in situ framework inside a task-based molecular dynamics code designed for exascale supercomputers. We take benefit from the composability properties of the task-based programming model to implement the TINS hybrid framework. Analytics workflows are expressed as graphs of tasks that can in turn generate children tasks to be executed in transit or interleaved with simulation tasks in situ. The in situ execution is performed thanks to an innovative dynamic helper core strategy that uses the work stealing concept to finely interleave simulation and analytics tasks inside a compute node with a low overhead on the simulation execution time.TINS uses the Intel® TBB work stealing scheduler and is integrated into ExaStamp, a task-based molecular dynamics code. Various experiments have shown that TINS is up to 40% faster than state-of-the-art in situ libraries. Molecular dynamics simulations of up to 2 billions particles on up to 14,336 cores have shown that TINS is able to execute complex analytics workflows at a high frequency with an overhead smaller than 10%.
|
Page generated in 0.1153 seconds