Spelling suggestions: "subject:"codegeneration"" "subject:"vasodegeneration""
171 |
Rigorous Design Flow for Programming Manycore Platforms / Flot de conception rigoureux pour la programmation de plates-formes manycore.Bourgos, Paraskevas 09 April 2013 (has links)
L'objectif du travail présenté dans cette thèse est de répondre à un verrou fondamental, qui est «comment programmer d'une manière rigoureuse et efficace des applications embarquées sur des plateformes multi-coeurs?». Cette problématique pose plusieurs défis: 1) le développement d'une approche rigoureuse basée sur les modèles pour pouvoir garantir la correction; 2) le « mariage » entre modèle physique et modèle de calcul, c'est-à-dire, l'intégration du fonctionnel et non-fonctionnel; 3) l'adaptabilité. Pour s'attaquer à ces défis, nous avons développé un flot de conception rigoureux autour du langage BIP. Ce flot de conception permet l'exploration de l'espace de conception, le traitement à diffèrent niveaux d'abstraction à la fois pour la plate-forme et l'application, la génération du code et le déploiement sur des plates-formes multi-cœurs. La méthode utilisée s'appuie sur des transformations source-vers-source des modèles BIP. Ces transformations sont correctes-par-construction. Nous illustrons ce flot de conception avec la modélisation et le déploiement de plusieurs applications sur deux plates-formes différentes. La première plate-forme considérée est MPARM, une plate-forme virtuelle, basée sur des processeurs ARM et structurée avec des clusters, où chacun contient plusieurs cœurs. Pour cette plate-forme, nous avons considérée les applications suivantes: la factorisation de Cholesky, le décodage MPEG-2, le décodage MJPEG, la Transformée de Fourier Rapide et un algorithme de demosaicing. La seconde plate-forme est P2012/STHORM, une plate-forme multi-cœur, basée sur plusieurs clusters capable d'une gestion énergétique efficace. L'application considérée sur P2012/STHORM est l'algorithme HMAX. Les résultats expérimentaux montrent l'intérêt du flot de conception, notamment l'analyse rapide des performances ainsi que la modélisation au niveau du système, la génération de code et le déploiement. / The advent of many-core platforms is nowadays challenging our capabilities for efficient and predictable design. To meet this challenge, designers need methods and tools for guaranteeing essential properties and determining tradeoffs between performance and efficient resource management. In the process of designing a mixed software/hardware system, functional constraints and also extra-functional specifications should be taken into account as an essential part for the design of embedded systems. The impact of design choices on the overall behavior of the system should also be analyzed. This implies a deep understanding of the interaction between application software and the underlying execution platform. We present a rigorous model-based design flow for building parallel applications running on top of many-core platforms. The flow is based on the BIP - Behavior, Interaction, Priority - component framework and its associated toolbox. The method allows generation of a correct-by-construction mixed hardware/software system model for manycore platforms from an application software and a mapping. It is based on source-to-source correct-by-construction transformations of BIP models. It provides full support for modeling application software and validation of its functional correctness, modeling and performance analysis of system-level models, code generation and deployment on target many-core platforms. Our design flow is illustrated through the modeling and deployment of various software applications on two different hardware platforms; MPARM and platform P2012/STHORM. MPARM is a virtual ARM-based multi-cluster manycore platform, configured by the number of clusters, the number of ARM cores per cluster, and their interconnections. On MPARM, the software applications considered are the Cholesky factorization, the MPEG-2 decoding, the MJPEG decoding, the Fast Fourier Transform and the Demosaicing algorithm. Platform 2012 (P2012/STHORM) is a power efficient manycore computing fabric, which is highly modular and based on multiple clusters capable of aggressive fine-grained power management. As a case study on P2012/STHORM, we used the HMAX algorithm. Experimental results show the merits of the design flow, notably performance analysis as well as correct-by-construction system level modeling, code generation and efficient deployment.
|
172 |
Desenvolvimento e reúso de frameworks com base nas características do domínioViana, Matheus Carvalho 08 May 2014 (has links)
Made available in DSpace on 2016-06-02T19:03:58Z (GMT). No. of bitstreams: 1
6131.pdf: 4820064 bytes, checksum: 9eff44ec6ccb5da42e47aea930745c02 (MD5)
Previous issue date: 2014-05-08 / Universidade Federal de Sao Carlos / Frameworks are software artifacts that implement the basic functionality of a domain. Its reuse can improve the efficiency of development process and the quality of application code. However, frameworks are difficult to develop and reuse, since they require a complex structure to implement domain variability and be adaptable enough to be reused by different applications. Due to these difficulties, this research presents two approaches: 1) the From Features to Frameworks (F3) approach, in which the developer models the features of a domain and a pattern language helps in implementing a framework based on this model; and 2) the approach that uses a Domain-Specific Language (DSL) built from the identification and analysis of the domain features of a framework to facilitate the reuse of this framework. A tool, called From Features to Frameworks Tool (F3T), was also developed to support the use of these two approaches, providing editors for modeling domains and applications and automating the implementation of code of frameworks, DSLs and applications. In addition to facilitate the development and reuse of frameworks, experiments conducted during this project showed that these two approaches make these processes more efficient and allow the construction of frameworks and applications with less difficulty. / Frameworks são artefatos de software que implementam a funcionalidade básica de um domínio. Seu reúso pode aumentar a eficiência do processo de desenvolvimento e a qualidade do código de aplicações. Contudo, frameworks são difíceis de construir e reutilizar, pois necessitam de uma estrutura complexa para implementar as variabilidades do seu domínio e serem adaptáveis o suficiente para poderem ser reutilizados por diversas aplicações. Em vista dessas dificuldades este projeto apresenta duas abordagens: 1) a abordagem From Features to Frameworks (F3), na qual o desenvolvedor modela as características de um domínio e uma linguagem de padrões auxilia na implementação de um framework com base nesse modelo; e 2) a abordagem que utiliza uma Domain-Specific Language (DSL) construída a partir da identificação e análise das características do domínio do framework para facilitar o reúso desse framework. Uma ferramenta, denominada From Features to Frameworks Tool (F3T), também foi desenvolvida para apoiar o uso dessas duas abordagens, fornecendo editores para a modelagem dos domínios e das aplicações e automatizando a implementação de código dos frameworks, das DSLs e das aplicações. Além de facilitar o desenvolvimento e o reúso de framework, experimentos realizados ao longo deste projeto mostraram que essas duas abordagens tornam esses processos mais eficientes e permitem a construção de frameworks e aplicações com menor dificuldade.
|
173 |
Uma abordagem para migração automática de código no contexto do desenvolvimento orientado a modelosPossatto, Marcos Antonio 22 October 2013 (has links)
Made available in DSpace on 2016-06-02T19:06:10Z (GMT). No. of bitstreams: 1
5666.pdf: 3321258 bytes, checksum: c402c2fb8a619d07842991622736ea5a (MD5)
Previous issue date: 2013-10-22 / Universidade Federal de Minas Gerais / Code generators play a key role in model-driven software development. They are responsible for transforming high-level assets (models) into implementation assets (code). Most generators are based on templates, which are pieces of text instrumented with code expansion elements. They receive an input and produce an output according to the template's programming. To build such template-based generators, the code of an existing implementation, already tested and validated, can be used as a reference, in a process known as code migration. With software evolution and the need for changes in the code generator, the templates start to differ from this reference implementation. In order to restablish the synchronization, additional effort is required. Tackling the challenge of keeping these assets synchronized (reference implementation and templates) is this dissertation's subject. The goal is to provide some automation to the code migration process, even if partial, in order to increase productivity in the maintenance of code generators. A mechanism was developed to make it possible to automatically reproduce changes that are performed in the reference implementation into one or more code generation templates. This mechanism was evaluated through an empirical study, yielding good performance in a controlled environment. This indicates that automation can help to reduce the effort in the maintenance of code generators in a model-driven development context. / Os geradores de código desempenham um papel fundamental no desenvolvimento de software orientado a modelos. São responsáveis pela transformação dos artefatos de alto nível de abstração (modelo) em elementos de implementação (código). Os tipos mais comuns de geradores são os baseados em template. São compostos fundamentalmente por elementos de expansão de código, que recebem uma entrada e a convertem em código, conforme a programação inserida nesses templates. O código de uma implementação já testado e validado pode servir de referência para a criação de templates, por meio de um processo conhecido como migração de código. Com a dinâmica da evolução do software e a necessidade de efetuar mudanças no gerador de código ocorre a perda de sincronismo entre os templates e esse código de referência, sendo necessário um esforço adicional para mantê-los sincronizados. O desafio de manter esses artefatos sincronizados constituiu o objetivo desta dissertação de mestrado, que proporcionou ganhos de produtividade, por meio de uma automação, ainda que parcial, desse processo. Nesse sentido, foi desenvolvido um mecanismo para propagar automaticamente as alterações introduzidas no código de referência para os templates, que reduziu o tempo e facilitou a manutenção de geradores de código que sofrem com o problema da perda de sincronismo nesses artefatos. O protótipo para a migração automática de código desenvolvido nesta dissertação foi submetido a um estudo empírico, atingindo um bom desempenho com a sua utilização na maioria das tarefas de migração de código avaliadas, o que indica que a automação pode ajudar a resolver o problema e reduzir o esforço de manutenção no desenvolvimento de software orientado a modelos.
|
174 |
Efficient computation with structured matrices and arithmetic expressions / Calcul efficace avec des matrices structurées et des expressions arithmétiquesMouilleron, Christophe 04 November 2011 (has links)
Le développement de code efficace en pratique pour effectuer un calcul donné est un problème difficile. Cette thèse présente deux situations où nous avons été confronté à ce problème. La première partie de la thèse propose des améliorations au niveau algorithmique dans le cadre de l'algèbre linéaire structurée. Nous montrons d'abord comment étendre un algorithme de Cardinal pour l'inversion de matrices de type Cauchy afin de traiter les autres structures classiques. Cette approche, qui repose essentiellement sur des produits de type « matrice structurée × matrice », conduit à une accélération d'un facteur allant jusqu'à 7 en théorie et constaté en pratique. Ensuite, nous généralisons des travaux sur les matrices de type Toeplitz afin de montrer comment, pour les structures classiques, calculer le produit d'une matrice structurée n×n et de rang de déplacement α par une matrice n×α en Õ(α^(ω-1)n). Cela conduit à des algorithmes en Õ(α^(ω-1)n) pour l'inversion de matrices structurées, sans avoir à passer par des matrices de type Toeplitz. La deuxième partie de la thèse traite de l'implantation d'expressions arithmétiques. Ce sujet soulève de nombreuses questions comme le nombre d'opérations minimum, la vitesse, ou encore la précision des calculs en arithmétique approchée. En exploitant la nature inductive des expressions arithmétiques, il est possible de développer des algorithmes aidant à répondre à ces questions. Nous présentons ainsi plusieurs algorithmes de génération de schémas d'évaluation, de comptage et d'optimisation selon un ou plusieurs critères. Ces algorithmes ont été implanté dans une librairie qui a en autre été utilisée pour accélérer un logiciel de génération de code pour une librairie mathématique, et pour étudier des questions d'optimalité pour le problème de l'évaluation d'un polynôme à coefficients scalaires de petit degré en une matrice. / Designing efficient code in practice for a given computation is a hard task. In this thesis, we tackle this issue in two different situations. The first part of the thesis introduces some algorithmic improvements in structured linear algebra. We first show how to extend an algorithm by Cardinal for inverting Cauchy-like matrices to the other common structures. This approach, which mainly relies on products of the type "structured matrix × matrix", leads to a theoretical speed-up of a factor up to 7 that we also observe in practice. Then, we extend some works on Toeplitz-like matrices and prove that, for any of the common structures, the product of an n×n structured matrix of displacement rank α by an n×α matrix can be computed in Õ(α^(ω-1)n). This leads to direct inversion algorithms in Õ(α^(ω-1)n) , that do not rely on a reduction to the Toeplitz-like case. The second part of the thesis deals with the implementation of arithmetic expressions. This topic raises several issues like finding the minimum number of operations, and maximizing the speed or the accuracy when using some finite-precision arithmetic. Making use of the inductive nature of arithmetic expressions enables the design of algorithms that help to answer such questions. We thus present a set of algorithms for generating evaluation schemes, counting them, and optimizing them according to one or several criteria. These algorithms are part of a library that we have developed and used, among other things, in order to decrease the running time of a code generator for a mathematical library, and to study optimality issues about the evaluation of a small degree polynomial with scalar coefficients at a matrix point.
|
175 |
Effective Automatic Computation Placement and Data Allocation for Parallelization of Regular ProgramsChandan, G January 2014 (has links) (PDF)
Scientific applications that operate on large data sets require huge amount of computation power and memory. These applications are typically run on High Performance Computing (HPC) systems that consist of multiple compute nodes, connected over an network interconnect such as InfiniBand. Each compute node has its own memory and does not share the address space with other nodes. A significant amount of work has been done in past two decades on parallelizing for distributed-memory architectures. A majority of this work was done in developing compiler technologies such as high performance Fortran (HPF) and partitioned global address space (PGAS). However, several steps involved in achieving good performance remained manual. Hence, the approach currently used to obtain the best performance is to rely on highly tuned libraries such as ScaLAPACK. The objective of this work is to improve automatic compiler and runtime support for distributed-memory clusters for regular programs. Regular programs typically use arrays as their main data structure and array accesses are affine functions of outer loop indices and program parameters. A lot of scientific applications such as linear-algebra kernels, stencils, partial differential equation solvers, data-mining applications and dynamic programming codes fall in this category.
In this work, we propose techniques for finding computation mapping and data allocation when compiling regular programs for distributed-memory clusters. Techniques for transformation and detection of parallelism, relying on the polyhedral framework already exist. We propose automatic techniques to determine computation placements for identified parallelism and allocation of data. We model the problem of finding good computation placement as a graph partitioning problem with the constraints to minimize both communication volume and load imbalance for entire program. We show that our approach for computation mapping is more effective than those that can be developed using vendor-supplied libraries. Our approach for data allocation is driven by tiling of data spaces along with a compiler assisted runtime scheme to allocate and deallocate tiles on-demand and reuse them. Experimental results on some sequences of BLAS calls demonstrate a mean speedup of 1.82× over versions written with ScaLAPACK. Besides enabling weak scaling for distributed memory, data tiling also improves locality for shared-memory parallelization. Experimental results on a 32-core shared-memory SMP system shows a mean speedup of 2.67× over code that is not data tiled.
|
176 |
XFOR (Multifor) : A new programming structure to ease the formulation of efficient loop optimizations / XFOR (Multifor) : nouvelle structure de programmation pour faciliter la formulation des optimisations efficaces de bouclesFassi, Imen 27 November 2015 (has links)
Nous proposons une nouvelle structure de programmation appelée XFOR (Multifor), dédiée à la programmation orientée réutilisation de données. XFOR permet de gérer simultanément plusieurs boucles "for" ainsi que d’appliquer/composer des transformations de boucles d’une façon intuitive. Les expérimentations ont montré des accélérations significatives des codes XFOR par rapport aux codes originaux, mais aussi par rapport au codes générés automatiquement par l’optimiseur polyédrique de boucles Pluto. Nous avons mis en œuvre la structure XFOR par le développement de trois outils logiciels: (1) un compilateur source-à-source nommé IBB, qui traduit les codes XFOR en un code équivalent où les boucles XFOR ont été remplacées par des boucles for sémantiquement équivalentes. L’outil IBB bénéficie également des optimisations implémentées dans le générateur de code polyédrique CLooG qui est invoqué par IBB pour générer des boucles for à partir d’une description OpenScop; (2) un environnement de programmation XFOR nommé XFOR-WIZARD qui aide le programmeur dans la ré-écriture d’un programme utilisant des boucles for classiques en un programme équivalent, mais plus efficace, utilisant des boucles XFOR; (3) un outil appelé XFORGEN, qui génère automatiquement des boucles XFOR à partir de toute représentation OpenScop de nids de boucles transformées générées automatiquement par un optimiseur automatique. / We propose a new programming structure named XFOR (Multifor), dedicated to data-reuse aware programming. It allows to handle several for-loops simultaneously and map their respective iteration domains onto each other. Additionally, XFOR eases loop transformations application and composition. Experiments show that XFOR codes provides significant speed-ups when compared to the original code versions, but also to the Pluto optimized versions. We implemented the XFOR structure through the development of three software tools: (1) a source-to-source compiler named IBB for Iterate-But-Better!, which automatically translates any C/C++ code containing XFOR-loops into an equivalent code where XFOR-loops have been translated into for-loops. IBB takes also benefit of optimizations implemented in the polyhedral code generator CLooG which is invoked by IBB to generate for-loops from an OpenScop specification; (2) an XFOR programming environment named XFOR-WIZARD that assists the programmer in re-writing a program with classical for-loops into an equivalent but more efficient program using XFOR-loops; (3) a tool named XFORGEN, which automatically generates XFOR-loops from any OpenScop representation of transformed loop nests automatically generated by an automatic optimizer.
|
177 |
Méthodologie pour les études d’automatisation et la génération automatique de programmes Automates Programmables Industrielssûrs de fonctionnement. Application aux Equipements d’Alimentation des Lignes Électrifiées / Methodology for automation studies and for automatic generation of safety Programmable Logic Controller codeCoupat, Raphaël 27 November 2014 (has links)
Le projet de recherche présenté dans cette thèse a été réalisé avec la collaboration de la Direction de l'Ingénierie SNCF et le CReSTIC de l'Université de Reims Champagne-Ardenne (URCA). L'objectif de ce projet est de contribuer à l'amélioration des études de conception du contrôle/commande des projets d'électrification menées par les chargés d'études. Ce projet doit répondre à des objectifs humains, économiques et techniques exprimés par la SNCF, notamment appliqué au domaine des Equipements d'Alimentation des Lignes Electrifiées (EALE). Pour répondre à ces problématiques, une méthodologie pour les études d'automatisation est proposée. Elle intègre deux axes de recherche. Le premier axe est la génération automatique de livrables (codes, documents, schémas…). Celle-ci repose nécessairement sur une standardisation et une modélisation du « métier ». L'approche MDD (Model Driven Development) du génie logiciel et l'approche DSM (Domain Specific Modeling), apporte des éléments de solution reposant sur l'utilisation de « templates métiers ». Toutefois, il est fondamental de générer des livrables de qualité et du code API (Automates Programmables Industriels) sûr de fonctionnement. Le second axe de recherche s'intéresse à la commande sûre de fonctionnement. Trois approches de synthèse de la commande (la Supervisory Control Theory (SCT), la synthèse algébrique, la commande par contraintes logiques) permettant a priori de répondre à ces objectifs de sûreté sont présentées et discutées. La commande par contraintes logiques présente l'avantage majeur de séparer la sécurité (qui est vérifiée formellement hors ligne par model-checking) et le fonctionnel, et de pouvoir être utilisée avec des programmes API existants, ne remettant pas ainsi en cause la méthodologie de travail des chargés d'études. / The research project presented in this thesis has been realized with the collaboration of the Engineer Department of the SNCF and the CReSTIC of the University of Reims Champagne-Ardenne. The goal of this project is to contribute to the improvement of the control studies of the electrification projects realized by the design engineers. This project must meet human, economic and technical aims expressed by the SNCF applied to the field of the Power Supply Equipments of the Electrified Lines (EALE in french). To answer these problems, a methodology for the automation studies is proposed. It integrates two research orientations were studied. The first axis is the automatic generation the deliverables (codes, documents, diagrams…). This axis is based on standardization and modeling of the “work”. MDD (Model Driven Development) and DSM (Domain Specific Modeling) approaches, brings suggestions for solution based on the use of “work templates”. However, it is fundamental to generate quality deliverables and safe PLC (Programmable Logic Controller) code. The second research orientation is interested in safe control. Three approaches of control synthesis (Supervisory Control Theory (SCT), the algebraic synthesis, the control by logical constraints) permitting a priori to reach these aims of safety are presented and discussed. The major advantage of the control by logical constraints is to separate the safety (which is checked formally off line by model-checking) and the functional parts. It can be used with existing PLC programs, which doesn't change thus the working methodology of the design engineers.
|
178 |
Programmation des architectures hiérarchiques et hétérogènes / Programming hierarxchical and heterogenous machinesHamidouche, Khaled 10 November 2011 (has links)
Les architectures de calcul haute performance de nos jours sont des architectures hiérarchiques et hétérogènes: hiérarchiques car elles sont composées d’une hiérarchie de mémoire, une mémoire distribuée entre les noeuds et une mémoire partagée entre les coeurs d’un même noeud. Hétérogènes due à l’utilisation des processeurs spécifiques appelés Accélérateurs tel que le processeur CellBE d’IBM et les CPUs de NVIDIA. La complexité de maîtrise de ces architectures est double. D’une part, le problème de programmabilité: la programmation doit rester simple, la plus proche possible de la programmation séquentielle classique et indépendante de l’architecture cible. D’autre part, le problème d’efficacité: les performances doivent êtres proches de celles qu’obtiendrait un expert en écrivant le code à la main en utilisant des outils de bas niveau. Dans cette thèse, nous avons proposé une plateforme de développement pour répondre à ces problèmes. Pour cela, nous proposons deux outils : BSP++ est une bibliothèque générique utilisant des templates C++ et BSPGen est un framework permettant la génération automatique de code hybride à plusieurs niveaux de la hiérarchie (MPI+OpenMP ou MPI + Cell BE). Basée sur un modèle hiérarchique, la bibliothèque BSP++ prend les architectures hybrides comme cibles natives. Utilisant un ensemble réduit de primitives et de concepts intuitifs, BSP++ offre une simplicité d'utilisation et un haut niveau d' abstraction de la machine cible. Utilisant le modèle de coût de BSP++, BSPGen estime et génère le code hybride hiérarchique adéquat pour une application donnée sur une architecture cible. BSPGen génère un code hybride à partir d'une liste de fonctions séquentielles et d'une description de l'algorithme parallèle. Nos outils ont été validés sur différentes applications de différents domaines allant de la vérification et du calcul scientifique au traitement d'images en passant par la bioinformatique. En utilisant une large sélection d’architecture cible allant de simple machines à mémoire partagée au machines Petascale en passant par les architectures hétérogènes équipées d’accélérateurs de type Cell BE. / Today’s high-performance computing architectures are hierarchical and heterogeneous. With a hierarchy of memory, they are composed of distributed memory between nodes and shared memory between cores of the same node. heterogeneous due to the use of specific processors called accelerators such as the CellBE IBM processor and/or NVIDIA GPUs. The programming complexity of these architectures is twofold. On the one hand, the problem of programmability: the programming should be simple, as close as possible to the conventional sequential programming and independent of the target architecture. On the other hand, the problem of efficiency: performance should be similar to those obtained by a expert in writing code by hand using low-level tools. In this thesis, we proposed a development platform to address these problems. For this, we propose two tools: BSP++ is a generic library using C++ templates and BSPGen is a framework for the automatic hybrid multi-level hierarchy (MPI + OpenMP or MPI + Cell BE) code generation.Based on a hierarchical model, the BSP++ library takes the hybrid architectures as native targets. Using a small set of primitives and intuitive concepts, BSP++ provides a simple way to use and a high level of abstraction of the target machine. Using the cost model of BSP++, BSPGen predicts and generates the appropriate hierarchical hybrid code for a given application on target architecture. BSPGen generates hybrid code from a sequential list of functions and a description of the parallel algorithm.Our tools have been validated with various applications in different fields ranging from verification to scientific computing and image processing through bioinformatics. Using a wide selection of target architecture ranging from simple shared memory machines to Petascale machines through the heterogeneous architectures equipped with Cell BE accelerators.
|
179 |
Comprehensive Backend Support for Local Memory Fault ToleranceRink, Norman Alexander, Castrillon, Jeronimo 19 December 2016 (has links) (PDF)
Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact.
|
180 |
Compilation pour machines à mémoire répartie : une approche multipasse / Compilation for distributed memory machines : a multipass approachLossing, Nelson 03 April 2017 (has links)
Les grilles de calculs sont des architectures distribuées couramment utilisées pour l'exécution de programmes scientifiques ou de simulation. Les programmeurs doivent ainsi acquérir de nouvelles compétences pour pouvoir tirer partie au mieux de toutes les ressources offertes. Ils doivent apprendre à écrire un code parallèle, et, éventuellement, à gérer une mémoire distribuée.L'ambition de cette thèse est de proposer une chaîne de compilation permettant de générer automatiquement un code parallèle distribué en tâches à partir d'un code séquentiel. Pour cela, le compilateur source-à-source PIPS est utilisé. Notre approche a deux atouts majeurs : 1) une succession de transformations simples et modulaires est appliquée, permettant à l'utilisateur de comprendre les différentes transformations appliquées, de les modifier, de les réutiliser dans d'autres contextes, et d'en ajouter de nouvelles; 2) une preuve de correction de chacune des transformations est donnée, permettant de garantir que le code généré est équivalent au code initial.Cette génération automatique de code parallèle distribué de tâches offre également une interface de programmation simple pour les utilisateurs. Une version parallèle du code est automatiquement générée à partir d'un code séquentiel annoté.Les expériences effectuées sur deux machines parallèles, sur des noyaux de Polybench, montrent une accélération moyenne linéaire voire super-linéaire sur des exemples de petites tailles et une accélération moyenne égale à la moitié du nombre de processus sur des exemples de grandes tailles. / Scientific and simulation programs often use clusters for their execution. Programmers need new programming skills to fully take advantage of all the available resources. They have to learn how to write parallel codes, and how to manage the potentially distributed memory.This thesis aims at generating automatically a distributed parallel code for task parallelisation from a sequential code. A source-to-source compiler, PIPS, is used to achieve this goal. Our approach has two main advantages: 1) a chain of simple and modular transformations to apply, thus visible and intelligible by the users, editable and reusable, and that make new optimisations possible; 2) a proof of correctness of the parallelisation process is made, allowing to insure that the generated code is correct and has the same result as the sequential one.This automatic generation of distributed-task program for distributed-memory machines provide a simple programming interface for the users to write a task oriented code. A parallel code can thus automatically be generated with our compilation process.The experimental results obtained on two parallel machines, using Polybench kernels, show a linear to super-linear average speedup on small data sizes. For large ones, average speedup is equal to half the number of processes.
|
Page generated in 0.1145 seconds