Spelling suggestions: "subject:"highperformance computing"" "subject:"highperformance computing""
301 |
A New Method for Modeling Free Surface Flows and Fluid-structure Interaction with Ocean ApplicationsLee, Curtis January 2016 (has links)
<p>The computational modeling of ocean waves and ocean-faring devices poses numerous challenges. Among these are the need to stably and accurately represent both the fluid-fluid interface between water and air as well as the fluid-structure interfaces arising between solid devices and one or more fluids. As techniques are developed to stably and accurately balance the interactions between fluid and structural solvers at these boundaries, a similarly pressing challenge is the development of algorithms that are massively scalable and capable of performing large-scale three-dimensional simulations on reasonable time scales. This dissertation introduces two separate methods for approaching this problem, with the first focusing on the development of sophisticated fluid-fluid interface representations and the second focusing primarily on scalability and extensibility to higher-order methods.</p><p>We begin by introducing the narrow-band gradient-augmented level set method (GALSM) for incompressible multiphase Navier-Stokes flow. This is the first use of the high-order GALSM for a fluid flow application, and its reliability and accuracy in modeling ocean environments is tested extensively. The method demonstrates numerous advantages over the traditional level set method, among these a heightened conservation of fluid volume and the representation of subgrid structures.</p><p> </p><p>Next, we present a finite-volume algorithm for solving the incompressible Euler equations in two and three dimensions in the presence of a flow-driven free surface and a dynamic rigid body. In this development, the chief concerns are efficiency, scalability, and extensibility (to higher-order and truly conservative methods). These priorities informed a number of important choices: The air phase is substituted by a pressure boundary condition in order to greatly reduce the size of the computational domain, a cut-cell finite-volume approach is chosen in order to minimize fluid volume loss and open the door to higher-order methods, and adaptive mesh refinement (AMR) is employed to focus computational effort and make large-scale 3D simulations possible. This algorithm is shown to produce robust and accurate results that are well-suited for the study of ocean waves and the development of wave energy conversion (WEC) devices.</p> / Dissertation
|
302 |
[en] SOLVING LARGE SYSTEMS OF LINEAR EQUATIONS ON MULTI-GPU CLUSTERS USING THE CONJUGATE GRADIENT METHOD IN OPENCLTM / [pt] RESOLUÇÃO DE SISTEMAS DE EQUAÇÕES LINEARES DE GRANDE PORTE EM CLUSTERS MULTI-GPU UTILIZANDO O MÉTODO DO GRADIENTE CONJUGADO EM OPENCLTMANDRE LUIS CAVALCANTI BUENO 27 September 2013 (has links)
[pt] Sistemas de equações lineares esparsos e de grande porte aparecem
como resultado da modelagem de vários problemas nas engenharias. Dada
sua importância, muitos trabalhos estudam métodos para a resolução
desses sistemas. Esta dissertação explora o potencial computacional de
múltiplas GPUs, utilizando a tecnologia OpenCL, com a finalidade de
resolver sistemas de equações lineares de grande porte. Na metodologia
proposta, o método do gradiente conjugado é subdivido em kernels que
são resolvidos por múltiplas GPUs. Para tal, se fez necessário compreender
como a arquitetura das GPUs se relaciona com a tecnologia OpenCL a fim
de obter um melhor desempenho. / [en] The process of modeling problems in the engineering fields tends to
produce substantiously large systems of sparse linear equations. Extensive
research has been done to devise methods to solve these systems. This
thesis explores the computational potential of multiple GPUs, through
the use of the OpenCL tecnology, aiming to tackle the solution of large
systems of sparse linear equations. In the proposed methodology, the
conjugate gradient method is subdivided into kernels, which are delegated
to multiple GPUs. In order to achieve an efficient method, it was necessary
to understand how the GPUs’ architecture communicates with OpenCL.
|
303 |
Technoeconomic aspects of next-generation telecommunications including the Internet serviceUnknown Date (has links)
This research is concerned with the technoeconomic aspects of modern and next-generation telecommunications including the Internet service. The goal of this study thereof is tailored to address the following: (i) Reviewing the technoeconomic considerations prevailing in telecommunication (telco) systems and their implicating futures; (ii) studying relevant considerations by depicting the modern/next-generation telecommunications as a digital ecosystem viewed in terms of underlying complex system evolution (akin to biological systems); (iii) pursuant to the digital ecosystem concept, co-evolution modeling of competitive business structures in the technoeconomics of telco services using dichotomous (flip-flop) states as seen in prey-predator evolution; (iv) specific to Internet pricing economics, deducing the profile of consumer surplus versus pricing model under DiffServ QoS architecture pertinent to dynamic- , smart- and static-markets; (v) developing and exemplifying decision-making pursuits in telco business under non-competitive and competitive markets (via gametheoretic approach); (vi) and modeling forecasting issues in telco services addressed in terms of a simplified ARIMA-based time-series approach, (which includes seasonal and non-seasonal data plus goodness-fit estimations in time- and frequency-domains). Commensurate with the topics indicated above, necessary analytical derivations/models are proposed and computational exercises are performed (with MatLabTM R2006b and other software as needed). Extensive data gathered from open literature are used thereof and, ad hoc model verifications are performed. Lastly, results are discussed, inferences are made and open-questions for further research are identified. / by Renata Cristina Tourinho Sardenberg. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web.
|
304 |
Metodologia para execução de aplicações paralelas baseadas no modelo BSP com tarefas heterogêneas. / Methodology for parallel application execution based on BSP model with heterogeneous tasks.Luz, Fernando Henrique e Paula da 21 September 2015 (has links)
A computação paralela permite uma série de vantagens para a execução de aplicações de grande porte, sendo que o uso efetivo dos recursos computacionais paralelos é um aspecto relevante da computação de alto desempenho. Este trabalho apresenta uma metodologia que provê a execução, de forma automatizada, de aplicações paralelas baseadas no modelo BSP com tarefas heterogêneas. É considerado no modelo adotado, que o tempo de computação de cada tarefa secundária não possui uma alta variância entre uma iteração e outra. A metodologia é denominada de ASE e é composta por três etapas: Aquisição (Acquisition), Escalonamento (Scheduling) e Execução (Execution). Na etapa de Aquisição, os tempos de processamento das tarefas são obtidos; na etapa de Escalonamento a metodologia busca encontrar a distribuição de tarefas que maximize a velocidade de execução da aplicação paralela, mas minimizando o uso de recursos, por meio de um algoritmo desenvolvido neste trabalho; e por fim a etapa de Execução executa a aplicação paralela com a distribuição definida na etapa anterior. Ferramentas que são aplicadas na metodologia foram implementadas. Um conjunto de testes aplicando a metodologia foi realizado e os resultados apresentados mostram que os objetivos da proposta foram alcançados. / Parallel computing allows for a series of advantages on the execution of large applications and the effective use of parallel resources is an important aspect in the High Performance Computing. This work presents a methodology to provide the execution, in an automated way, of parallel applications based on BSP model with heterogeneous tasks. In this model it is assumed that the computation time between iterations does not have a high variance. The methodology is entitled ASE and it is composed by three stages: Acquisition, Scheduling and Execution. In the Acquisition step, the tasks\' processing time are obtained; In the Scheduling step, the methodology finds the ideal arrangement to distribute the tasks to maximize the execution speed and, simultaneously, minimize the use of resources. This is made using an algorithm developed in this work; and lastly the Execution step, where the parallel application is executed in the distribution defined in the previous step. The tools used in the methodology were implemented. A set of tests to apply the methodology were made and the results shown that the objectives were reached.
|
305 |
Design and Evaluation of a Public Resource Computing FrameworkBaldassari, James D 20 April 2006 (has links)
Public resource computing (PRC) is an innovative approach to high performance computing that relies on volunteers who donate their personal computers' unused resources to a computationally intensive research project. Prominent PRC projects include SETI@home, Folding@Home, and distributed.net. Many PRC projects are built upon a PRC framework that abstracts functionality that is common to all PRC projects, such as network communications, database access, and project management. These PRC frameworks tend to be complex, limiting, and difficult to use. We have designed and implemented a new PRC framework called the Simple Light-weight Infrastructure for Network Computing (SLINC) that addresses the disadvantages we identified with existing frameworks. SLINC is a flexible and extensible PRC framework that will enable researchers to more easily build PRC projects.
|
306 |
Vertelastic: um módulo de decisão para explorando elasticidade vertical no AutoelasticMoreira, Gabriel Araujo Siccardi 25 September 2018 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2019-03-07T14:25:20Z
No. of bitstreams: 1
Gabriel Araujo Siccardi Moreira_.pdf: 1944250 bytes, checksum: ba14c1d555cd6aab9a45c10ed781a54e (MD5) / Made available in DSpace on 2019-03-07T14:25:20Z (GMT). No. of bitstreams: 1
Gabriel Araujo Siccardi Moreira_.pdf: 1944250 bytes, checksum: ba14c1d555cd6aab9a45c10ed781a54e (MD5)
Previous issue date: 2018-09-25 / Nenhuma / O conceito de elasticidade está muito ligado à computação em nuvens, pois consiste na capacidade de contrair recursos computacionais de maneira dinâmica e em tempo real. Usualmente, em computação de alto desempenho (HPC), as aplicações são modeladas para serem utilizadas com técnica de balanceamento de carga, fazendo uso da tecnologia de máquinas virtual. A computação paralela há muito tem sido utilizada para resolver questões computacionais que envolvem a execução de muitos processos simultaneamente e demanda quantidade grande de cálculos, cuja premissa é que um grande trecho de código a ser processado pode ser quebrado em menores e, assim, o problema como um todo dividido e resolvido de forma mais rápida. HPC é um típico caso de uso de paralelismo computacional que tem como seu protocolo de comunicação mais comum o Message Passing Interface (MPI), porém quando estamos tratando de aplicações em MPI, o aproveitamento máximo da elasticidade se dá de forma trabalhosa, com a necessidade de reescrita de código, de conhecimento profundo do comportamento da aplicação, além de serem inevitáveis algumas interrupções na aplicação para recompilar novas e pô-la em produção. A fim de evitar a reescrita de código e o aproveitamento total dos hardwares que estão cada vez mais robustos propõe-se na pesquisa desta dissertação a possibilidade de implementação de elasticidade vertical para trabalhar com aplicação de alto desempenho. Um modulo de decisão chamado VertElastic, é incorporado ao framework AutoElastic permitindo assim que se expanda a possibilidade para as duas formas de elasticidade – vertical e horizontal, podendo ainda ser feita de forma fixa com a indicação de threholds ou com predição os valores sejam calculados automaticamente. Trabalhos abordam a elasticidade vertical com threshold, já outros se valem da elasticidade horizontal de forma proativa e/ou reativa, porém não se encontrou pesquisas que permitissem a flexibilidade de se utilizar elasticidade vertical ou horizontal conforme a necessidade de forma proativa ou reativa, para isso o VertElastic se utiliza da elasticidade assíncrona, proporcionando que a aplicação não seja bloqueada enquanto a elasticidade acontece, seja ela para aumentar ou diminuir o recurso computacional. O VertElastic demostra sua viabilidade em uma rotina de testes executados na ferramenta open source OpenNebula. A execução de uma aplicação CPU-Bound
demostra que o VertElastic se mostrou entre 13% e 38% mais eficaz que a não utilização de nenhuma técnica de elasticidade. Os testes ainda mostraram que quanto maior o threshold utilizado menor é o ganho no consumo de recursos computacionais e maior o tempo de execução da aplicação. / The concept of elasticity is closely linked to cloud computing because it consists of the ability to contract computational resources dynamically and in real time. Usually, in high performance computing (HPC), applications are modeled for use with load balancing technology, making use of virtual machine technology. Parallel computing has long been used to solve computational issues involving the execution of many processes simultaneously and demand large amounts of computations whose premise is that a large piece of code to be processed can be broken into smaller ones and thus the problem as a whole divided and resolved more quickly. HPC is a typical case of use of computer parallelism that has as its most common communication protocol Message Passing Interface (MPI), but when we are dealing with applications in MPI, the maximum use of elasticity occurs in a laborious way, with the need code rewriting, deep knowledge of application behavior, and some interruptions in the application to recompile new ones and put it into production are inevitable. In order to avoid the rewriting of code and the total use of hardwares that are increasingly robust, it is proposed in the research of this dissertation the possibility of implementing vertical elasticity to work with high performance application. A decision module called VertElastic is incorporated into the AutoElastic framework, thus allowing the possibility for both forms of elasticity - vertical and horizontal - to be expanded, and can be done in a fixed way with the indication of threholds or with prediction values are calculated automatically. Studies deal with vertical elasticity with threshold, while others use proactive and / or reactive horizontal elasticity, but no research was found that allowed the flexibility to use vertical or horizontal elasticity as needed proactively or reactively, for this the VertElastic uses the asynchronous elasticity, providing that the application is not blocked while the elasticity happens, be it to increase or decrease the computational resource. VertElastic demonstrates its feasibility in a testing routine run on the open source OpenNebula tool. The execution of a CPU-Bound application showed that VertElastic was 13% to 38% more effective than the non-use of any elasticity technique. The tests also showed that the higher the threshold used the lower the gain in the consumption of computational resources and the longer the execution time of the application.
|
307 |
Méthodes de décomposition de domaine. Application au calcul haute performance / Domain decomposition methods. Application to high-performance computingJolivet, Pierre 02 October 2014 (has links)
Cette thèse présente une vision unifiée de plusieurs méthodes de décomposition de domaine : celles avec recouvrement, dites de Schwarz, et celles basées sur des compléments de Schur, dites de sous-structuration. Il est ainsi possible de changer de méthodes de manière abstraite et de construire différents préconditionneurs pour accélérer la résolution de grands systèmes linéaires creux par des méthodes itératives. On rencontre régulièrement ce type de systèmes dans des problèmes industriels ou scientifiques après discrétisation de modèles continus. Bien que de tels préconditionneurs exposent naturellement de bonnes propriétés de parallélisme sur les architectures distribuées, ils peuvent s’avérer être peu performants numériquement pour des décompositions complexes ou des problèmes physiques multi-échelles. On peut pallier ces défauts de robustesse en calculant de façon concurrente des problèmes locaux creux ou denses aux valeurs propres généralisées. D’aucuns peuvent alors identifier des modes qui perturbent la convergence des méthodes itératives sous-jacentes a priori. En utilisant ces modes, il est alors possible de définir des opérateurs de projection qui utilisent un problème dit grossier. L’utilisation de ces outils auxiliaires règle généralement les problèmes sus-cités, mais tend à diminuer les performances algorithmiques des préconditionneurs. Dans ce manuscrit, on montre en trois points quela nouvelle construction développée est performante : 1) grâce à des essais numériques à très grande échelle sur Curie—un supercalculateur européen, puis en le comparant à des solveurs de pointe 2) multi-grilles et 3) directs. / This thesis introduces a unified framework for various domain decomposition methods:those with overlap, so-called Schwarz methods, and those based on Schur complements,so-called substructuring methods. It is then possible to switch with a high-level of abstractionbetween methods and to build different preconditioners to accelerate the iterativesolution of large sparse linear systems. Such systems are frequently encountered in industrialor scientific problems after discretization of continuous models. Even though thesepreconditioners naturally exhibit good parallelism properties on distributed architectures,they can prove inadequate numerical performance for complex decompositions or multiscalephysics. This lack of robustness may be alleviated by concurrently solving sparse ordense local generalized eigenvalue problems, thus identifying modes that hinder the convergenceof the underlying iterative methods a priori. Using these modes, it is then possibleto define projection operators based on what is usually referred to as a coarse solver. Theseauxiliary tools tend to solve the aforementioned issues, but typically decrease the parallelefficiency of the preconditioners. In this dissertation, it is shown in three points thatthe newly developed construction is efficient: 1) by performing large-scale numerical experimentson Curie—a European supercomputer, and by comparing it with state of the art2) multigrid and 3) direct solvers.
|
308 |
Numerical modeling of fluid-structure interaction in bio-inspired propulsionEngels, Thomas 10 December 2015 (has links)
Les animaux volants et flottants ont développé des façons efficaces de produire l'écoulement de fluide qui génère les forces désirées pour leur locomotion. Cette thèse est placée dans ce contexte interdisciplinaire et utilise des simulations numériques pour étudier ces problèmes d'interaction fluides-structure, et les applique au vol des insectes et à la nage des poissons. Basée sur les travaux existants sur les obstacles mobiles rigides, une méthode numérique a été développée, permettant également la simulation des obstacles déformables et fournissant une polyvalence et précision accrues dans le cas des obstacles rigides. Nous appliquons cette méthode d'abord aux insectes avec des ailes rigides, où le corps et d'autres détails, tels que les pattes et les antennes, peuvent être inclus. Après la présentation de tests de validation détaillée, nous procédons à l'étude d'un modèle de bourdon dans un écoulement turbulent pleinement développé. Nos simulations montrent que les perturbations turbulentes affectent les insectes volants d'une manière différente de celle des avions aux ailes fixées et conçues par l'humain. Dans le cas de ces derniers, des perturbations en amont peuvent déclencher des transitions dans la couche limite, tandis que les premiers ne présentent pas de changements systématiques dans les forces aérodynamiques. Nous concluons que les insectes se trouvent plutôt confrontés à des problèmes de contrôle dans un environnement turbulent qu'à une détérioration de la production de force. Lors de l‘étape suivante, nous concevons un modèle solide, basé sur une équation de barre monodimensionnelle, et nous passons à la simulation des systèmes couplés fluide–structure. / Flying and swimming animals have developed efficient ways to produce the fluid flow that generates the desired forces for their locomotion. These bio-inspired problems couple fluid dynamics and solid mechanics with complex geometries and kinematics. The present thesis is placed in this interdisciplinary context and uses numerical simulations to study these fluid--structure interaction problems with applications in insect flight and swimming fish. Based on existing work on rigid moving obstacles, using an efficient Fourier discretization, a numerical method has been developed, which allows the simulation of flexible, deforming obstacles as well, and provides enhanced versatility and accuracy in the case of rigid obstacles. The method relies on the volume penalization method and the fluid discretization is still based on a Fourier discretization. We first apply this method to insects with rigid wings, where the body and other details, such as the legs and antennae, can be included. After presenting detailed validation tests, we proceed to studying a bumblebee model in fully developed turbulent flow. Our simulations show that turbulent perturbations affect flapping insects in a different way than human-designed fixed-wing aircrafts. While in the latter, upstream perturbations can cause transitions in the boundary layer, the former do not present systematical changes in aerodynamic forces. We conclude that insects rather face control problems in a turbulent environment than a deterioration in force production. In the next step, we design a solid model, based on a one--dimensional beam equation, and simulate coupled fluid--solid systems.
|
309 |
Contributions à l'amélioration de l'extensibilité de simulations parallèles de plasmas turbulents / Towards highly scalable parallel simulations for turbulent plasma physicsRozar, Fabien 05 November 2015 (has links)
Les besoins en énergie dans le monde sont croissants alors que les ressources nécessaires pour la production d'énergie fossile s'épuisent d'année en année. Un des moyens alternatifs pour produire de l'énergie est la fusion nucléaire par confinement magnétique. La maîtrise de cette réaction est un défi et constitue un domaine actif de recherche. Pour améliorer notre connaissance des phénomènes qui interviennent lors de la réaction de fusion, deux approches sont mises en oeuvre : l'expérience et la simulation. Les expérience réalisées grâce aux Tokamaks permettent de prendre des mesures. Ceci nécessite l'utilisation des technologiques les plus avancées. Actuellement, ces mesures ne permettent pas d'accéder à toutes échelles de temps et d'espace des phénomènes physiques. La simulation numérique permet d'explorer ces échelles encore inaccessibles par l'expérience. Les ressources matérielles qui permettent d'effectuer des simulations réalistes sont conséquentes. L'usage du calcul haute performance (High Performance Computing HPC) est nécessaire pour avoir accès à ces simulations. Ceci se traduit par l'exploitation de grandes machines de calcul aussi appelées supercalculateurs. Les travaux réalisés dans cette thèse portent sur l'optimisation de l'application Gysela qui est un code de simulation de turbulence de plasma. L'optimisation d'un code de calcul scientifique vise classiquement l'un des trois points suivants : (i ) la simulation de plus grand domaine de calcul, (ii ) la réduction du temps de calcul et (iii ) l'amélioration de la précision des calculs. La première partie de ce manuscrit présente les contributions concernant la simulation de plus grand domaine. Comme beaucoup de codes de simulation, l'amélioration de la précision de la simulation est souvent synonyme de raffinement du maillage. Plus un maillage est fin, plus la consommation mémoire est grande. De plus, durant ces dernières années, les supercalculateurs ont eu tendance à disposer de moins en moins de mémoire par coeur de calcul. Pour ces raisons, nous avons développé une bibliothèque, la libMTM (Modeling and Tracing Memory), dédiée à l'étude précise de la consommation mémoire d'applications parallèles. Les outils de la libMTM ont permis de réduire la consommation mémoire de Gysela et d'étudier sa scalabilité. À l'heure actuelle, nous ne connaissons pas d'autre outil qui propose de fonctionnalités équivalentes permettant une étude précise de la scalabilité mémoire. La deuxième partie de ce manuscrit présente les travaux concernant l'optimisation du temps d'exécution et l'amélioration de la précision de l'opérateur de gyromoyenne. Cet opérateur est fondamental dans le modèle gyromagnétique qui est utilisé par l'application Gysela. L'amélioration de la précision vient d'un changement de la méthode de calcul : un schéma basé sur une interpolation de type Hermite vient remplacer l'approximation de Padé. Il s'avère que cette nouvelle version de l'opérateur est plus précise mais aussi plus coûteuse en terme de temps de calcul que l'opérateur existant. Afin que les temps de simulation restent raisonnables, différentes optimisations ont été réalisées sur la nouvelle méthode de calcul pour la rendre très compétitive. Nous avons aussi développé une version parallélisée en MPI du nouvel opérateur de gyromoyenne. La bonne scalabilité de cet opérateur de gyromoyenne permettra, à terme, de réduire des coûts en communication qui sont pénalisants dans une application parallèle comme Gysela. / Energy needs around the world still increase despite the resources needed to produce fossil energy drain off year after year. An alternative way to produce energy is by nuclear fusion through magnetic confinement. Mastering this reaction is a challenge and represents an active field of the current research. In order to improve our understanding of the phenomena which occur during a fusion reaction, experiment and simulation are both put to use. The performed experiments, thanks to Tokamaks, allow some experimental reading. The process of experimental measurements is of great complexity and requires the use of the most advanced available technologies. Currently, these measurements do not give access to all scales of time and space of physical phenomenon. Numerical simulation permits the exploration of these scales which are still unreachable through experiment. An extreme computing power is mandatory to perform realistic simulations. The use of High Performance Computing (HPC) is necessary to access simulation of realistic cases. This requirement means the use of large computers, also known as supercomputers. The works realized through this thesis focuses on the optimization of the Gysela code which simulates a plasma turbulence. Optimization of a scientific application concerns mainly one of the three following points : (i ) the simulation of larger meshes, (ii ) the reduction of computing time and (iii ) the enhancement of the computation accuracy. The first part of this manuscript presents the contributions relative to simulation of larger mesh. Alike many simulation codes, getting more realistic simulations is often analogous to refine the meshes. The finer the mesh the larger the memory consumption. Moreover, during these last few years, the supercomputers had trend to provide less and less memory per computer core. For these reasons, we have developed a library, the libMTM (Modeling and Tracing Memory), dedicated to study precisely the memory consumption of parallel softwares. The libMTM tools allowed us to reduce the memory consumption of Gysela and to study its scalability. As far as we know, there is no other tool which provides equivalent features which allow the memoryscalability study. The second part of the manuscript presents the works relative to the optimization of the computation time and the improvement of accuracy of the gyroaverage operator. This operator represents a corner stone of the gyrokinetic model which is used by the Gysela application. The improvement of accuracy emanates from a change in the computing method : a scheme based on a 2D Hermite interpolation substitutes the Padé approximation. Although the new version of the gyroaverage operator is more accurate, it is also more expensive in computation time than the former one. In order to keep the simulation in reasonable time, diferent optimizations have been performed on the new computing method to get it competitive. Finally, we have developed a MPI parallelized version of the new gyroaverage operator. The good scalability of this new gyroaverage computer will allow, eventually, a reduction of MPI communication costs which are penalizing in Gysela.
|
310 |
Unified system of code transformation and execution for heterogeneous multi-core architectures. / Système unifié de transformation de code et d'éxécution pour un passage aux architectures multi-coeurs hétérogènesLi, Pei 17 December 2015 (has links)
Architectures hétérogènes sont largement utilisées dans le domaine de calcul haute performance. Cependant, le développement d'applications sur des architectures hétérogènes est indéniablement fastidieuse et sujette à erreur pour un programmeur même expérimenté. Pour passer une application aux architectures multi-cœurs hétérogènes, les développeurs doivent décomposer les données de l'entrée, gérer les échanges de valeur intermédiaire au moment d’exécution et garantir l'équilibre de charge de système. L'objectif de cette thèse est de proposer une solution de programmation parallèle pour les programmeurs novices, qui permet de faciliter le processus de codage et garantir la qualité de code. Nous avons comparé et analysé les défauts de solutions existantes, puis nous proposons un nouvel outil de programmation STEPOCL avec un nouveau langage de domaine spécifique qui est conçu pour simplifier la programmation sur les architectures hétérogènes. Nous avons évalué la performance de STEPOCL sur trois cas d'application classiques : un stencil 2D, une multiplication de matrices et un problème à N corps. Le résultat montre que : (i) avec l'aide de STEPOCL, la performance d'application varie linéairement selon le nombre d'accélérateurs, (ii) la performance de code généré par STEPOCL est comparable à celle de la version manuscrite. (iii) les charges de travail, qui sont trop grandes pour la mémoire d'un seul accélérateur, peuvent être exécutées en utilisant plusieurs accélérateurs. (iv) grâce à STEPOCL, le nombre de lignes de code manuscrite est considérablement réduit. / Heterogeneous architectures have been widely used in the domain of high performance computing. However developing applications on heterogeneous architectures is time consuming and error-prone because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. The aim of this thesis is to propose a solution of parallel programming for novice developers, to ease the complex coding process and guarantee the quality of code. We lighted and analysed the shortcomings of existing solutions and proposed a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for heterogeneous architectures. We evaluated both the performance and the usefulness of STEPOCL. The result show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with an handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.
|
Page generated in 0.1233 seconds