Global ETD Search

1	Squelettes algorithmiques méta-programmés : implantations, performances et sémantique / Metaprogrammed algorithmic skeletons : implementations, performances and semantics Javed, Noman 21 October 2011 (has links) Les approches de parallélisme structuré sont un compromis entre la parallélisation automatique et la programmation concurrentes et réparties telle qu'offerte par MPI ou les Pthreads. Le parallélisme à squelettes est l'une de ces approches. Un squelette algorithmique peut être vu comme une fonction d'ordre supérieur qui capture un algorithme parallèle classique tel qu'un pipeline ou une réduction parallèle. Souvent la sémantique des squelettes est simple et correspondant à celle de fonctions d'ordre supérieur similaire dans les langages de programmation fonctionnels. L'utilisation combine les squelettes disponibles pour construire son application parallèle. Lorsqu'un programme parallèle est conçu, les performances sont bien sûr importantes. Il est ainsi très intéressant pour le programmeur de disposer d'un modèle de performance, simple mais réaliste. Le parallélisme quasi-synchrone (BSP) offre un tel modèle. Le parallélisme étant présent maintenant dans toutes les machines, du téléphone au super-calculateur, il est important que les modèles de programmation s'appuient sur des sémantiques formelles pour permettre la vérification de programmes. Les travaux menés on conduit à la conception et au développement de la bibliothèque Orléans Skeleton Library ou OSL. OSL fournit un ensemble de squelettes algorithmiques data-parallèles quasi-synchrones. OSL est une bibliothèque pour le langage C++ et utilise des techniques de programmation avancées pour atteindre une bonne efficacité. Les communications se basent sur la bibliothèque MPI. OSL étant basée sur le modèle BSP, il est possible non seulement de prévoir les performances des programmes OSL mais également de fournir une portabilité des performances. Le modèle de programmation d'OSL a été formalisé dans l'assistant de preuve Coq. L'utilisation de cette sémantique pour la preuve de programmes est illustrée par un exemple. / Structured parallelism approaches are a trade-off between automatic parallelisation and concurrent and distributed programming such as Pthreads and MPI. Skeletal parallelism is one of the structured approaches. An algorithmic skeleton can be seen as higher-order function that captures a pattern of a parallel algorithm such as a pipeline, a parallel reduction, etc. Often the sequential semantics of the skeleton is quite simple and corresponds to the usual semantics of similar higher-order functions in functional programming languages. The user constructs a parallel program by combined calls to the available skeletons. When one is designing a parallel program, the parallel performance is of course important. It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model. Bulk Synchronous Parallelism (BSP) offers such a model. As the parallelism can now be found everywhere from smart-phones to the super computers, it becomes critical for the parallel programming models to support the proof of correctness of the programs developed with them. . The outcome of this work is the Orléans Skeleton Library or OSL. OSL provides a set of data parallel skeletons which follow the BSP model of parallel computation. OSL is a library for C++ currently implemented on top of MPI and using advanced C++ techniques to offer good efficiency. With OSL being based over the BSP performance model, it is possible not only to predict the performances of the application but also provides the portability of performance. The programming model of OSL is formalized using the big-step semantics in the Coq proof assistant. Based on this formal model the correctness of an OSL example is proved. Squelettes algorithmiques Parallélisme quasi-synchrone Algorithmic skeletons Bulk synchronous parallelism
2	Environnement pour le développement et la preuve de correction systèmatiques de programmes parallèles fonctionnels / Environment for the systematic development and proof of correction of functional parallel programs Tesson, Julien 08 November 2011 (has links) Concevoir et implanter des programmes parallèles est une tâche complexe, sujette aux erreurs. La vérification des programmes parallèles est également plus difficile que celle des programmes séquentiels. Pour permettre le développement et la preuve de correction de programmes parallèles, nous proposons de combiner le langage parallèle fonctionnel quasi-synchrone BSML, les squelettes algorithmiques - qui sont des fonctions d’ordre supérieur sur des structures de données réparties offrant une abstraction du parallélisme – et l’assistant de preuve Coq, dont le langage de spécification est suffisamment riche pour écrire des programmes fonctionnels purs et leurs propriétés. Nous proposons un plongement des primitives BSML dans la logique de Coq sous une forme modulaire adaptée à l’extraction de programmes. Ainsi, nous pouvons écrire dans Coq des programmes BSML, raisonner dessus, puis les extraire et les exécuter en parallèle. Pour faciliter le raisonnement sur ceux-ci, nous formalisons le lien entre programmes parallèles, manipulant des structures de données distribuées, et les spécifications, manipulant des structures séquentielles. Nous prouvons ainsi la correction d’une implantation du squelette algorithmique BH, un squelette adapté au traitement de listes réparties dans le modèle de parallélisme quasi synchrone. Pour un ensemble d’applications partant d’une spécification d’un problème sous forme d’un programme séquentiel simple, nous dérivons une instance de nos squelettes, puis nous extrayons un programme BSML avant de l’exécuter sur des machines parallèles. / Parallel program design and implementation is a complex, error prone task. Verifying parallel programs is also harder than verifying sequential ones. To ease the development and the proof of correction of parallel programs, we propose to combine the functional bulk synchronous parallel language BSML; the algorithmic skeleton, that are higher order function on distributed data structures which offer an abstraction of the parallelism ; and the Coq proof assistant, who’s specification language is rich enough to write purely functional programs together with their properties. We propose an embedding of BSML primitives in the Coq logic in a modular form, adapted to program extraction. So we can write BSML programs in Coq, reason on them, extract them and then execute them in parallel. To ease the specification of these programs, we formalise the relation between parallel programs using distributed data structures and specification using sequential data structure. We prove the correctness of an implementation of the BH skeleton. This skeleton is devoted to the treatment of distributed lists in the BSP model. For a set of application, starting from a sequential specification of a problem, we derive an instance of our skeletons, then extract a BSML program which is executed on parallel machines. Vérification formelle Squelettes algorithmiques Assistant de preuve Verification Algorithmic skeletons Proof assistant
3	Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous Systems Ernstsson, August January 2020 (has links) Today's society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore's law, hardware designers are finding new and increasingly complex ways to increase the accessible processor performance. It is getting more and more difficult to effectively target these processing resources without expert knowledge in parallelization, heterogeneous computation, communication, synchronization, and so on. To ensure that the software side can keep up, advanced programming environments and frameworks are needed to bridge the widening gap between hardware and software. One such example is the pattern-centric skeleton programming model and in particular the SkePU project. The work presented in this thesis first redesigns the SkePU framework based on modern C++ variadic template metaprogramming and state-of-the-art compiler technology. It then explores new ways to improve performance: by providing new patterns, improving the data access locality of existing ones, and using both static and dynamic knowledge about program flow. The work combines novel ideas with practical evaluation of the approach on several applications. The advancements also include the first skeleton API that allows variadic skeletons, new data containers, and finally an approach to make skeleton programming more customizable without compromising universal portability. / <p>Ytterligare forskningsfinansiärer: EU H2020 project EXA2PRO (801015); SeRC.</p> High‐level parallel programming Algorithmic skeletons Heterogeneous systems High‐performance computing Computer Sciences Datavetenskap (datalogi)
4	Design and evaluation of a plain MPI-based cluster execution backend for the SkePU 3 skeleton programming framework Zeijlon, Alexander January 2023 (has links) SkePU 3 is a framework for parallel program execution that uses higher order functions called skeletons, which provide a layer of abstraction between user code and the parallel implementation it provides through its backends. The backend that enables SkePU to run on an HPC cluster has a slowdown of a factor two. This reduces the viability of SkePU as an alternative for HPC, and as such, warrants an investigation. Programs written in SkePU are sequential-looking, single-source C++ programs where skeleton calls can transparently execute on multiple different types of processing units, such as CPU cores, GPUs and clusters, using different backends. In this thesis, a strategy for improving the performance of SkePU on clusters is presented, and with it, the design and implementation of a new cluster backend that is simpler and more closely integrated with the non-cluster SkePU code base. Runtime measurements are made, which show that the new cluster backend sees a relative speedup of about a factor of two, which effectively eliminates the slowdown. SkePU skeleton programming algorithmic skeletons HPC cluster parallel programming MPI OpenMP CUDA Hybrid NUMA Computer Sciences Datavetenskap (datalogi)
5	Structured arrows : a type-based framework for structured parallelism Castro, David January 2018 (has links) This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.
6	Integrating SkePU's algorithmic skeletons with GPI on a cluster / Integrering av SkePUs algoritmiska skelett med GPI på ett cluster Almqvist, Joel January 2022 (has links) As processors' clock-speed flattened out in the early 2000s, multi-core processors became more prevalent and so did parallel programming. However this programming paradigm introduces additional complexities, and to combat this, the SkePU framework was created. SkePU does this by offering a single-threaded interface which executes the user's code in parallel in accordance to a chosen computational pattern. Furthermore it allows the user themselves to decide which parallel backend should perform the execution, be it OpenMP, CUDA or OpenCL. This modular approach of SkePU thus allows for different hardware to be used without changing the code, and it currently supports CPUs, GPUs and clusters. This thesis presents a new so-called SkePU-backend made for clusters, using the communication library GPI. It demonstrates that the new backend is able to scale better and handle workload imbalances better than the existing SkePU-cluster-backend. This is achieved despite it performing worse at low node amounts, indicating that it requires less scaling overhead. Its weaknesses are also analyzed, partially from a design point of view, and clear solutions are presented, combined with a discussion as to why they arose in the first place. SlePU cluster GPI MPI OpenMP parallel programming HPC algorithmic skeletons distributed computing SkePU kluster GPI MPI OpenMP parallellprogrammering HPC algoritmiska skelett Computer Sciences Datavetenskap (datalogi)
7	Optimisation multi-niveau d’une application de traitement d’images sur machines parallèles / Multi-level optimisation of an image processing application on parallel machines Saidani, Tarik 06 November 2012 (has links) Cette thèse vise à définir une méthodologie de mise en œuvre d’applications performantes sur les processeurs embarqués du futur. Ces architectures nécessitent notamment d’exploiter au mieux les différents niveaux de parallélisme (grain fin, gros grain) et de gérer les communications et les accès à la mémoire. Pour étudier cette méthodologie, nous avons utilisé un processeur cible représentatif de ces architectures émergentes, le processeur CELL. Le détecteurde points d’intérêt de Harris est un exemple de traitement régulier nécessitant des unités de calcul intensif. En étudiant plusieurs schémas de mise en oeuvre sur le processeur CELL, nous avons ainsi pu mettre en évidence des méthodes d’optimisation des calculs en adaptant les programmes aux unités spécifiques de traitement SIMD du processeur CELL. L’utilisation efficace de la mémoire nécessite par ailleurs, à la fois une bonne exploitation des transferts et un arrangement optimal des données en mémoire. Nous avons développé un outil d’abstraction permettant de simplifier et d’automatiser les transferts et la synchronisation, CELL MPI. Cette expertise nous a permis de développer une méthodologie permettant de simplifier la mise en oeuvre parallèle optimisée de ces algorithmes. Nous avons ainsi conçu un outil de programmation parallèle à base de squelettes algorithmiques : SKELL BE. Ce modèle de programmation propose une solution originale de génération d’applications à base de métaprogrammation. Il permet, de manière automatisée, d’obtenir de très bonnes performances et de permettre une utilisation efficace de l’architecture, comme le montre la comparaison pour un ensemble de programmes test avec plusieurs autres outils dédiés à ce processeur. / This thesis aims to define a design methodology for high performance applications on future embedded processors. These architectures require an efficient usage of their different level of parallelism (fine-grain, coarse-grain), and a good handling of the inter-processor communications and memory accesses. In order to study this methodology, we have used a target processor which represents this type of emerging architectures, the Cell BE processor.We have also chosen a low level image processing application, the Harris points of interest detector, which is representative of a typical low level image processing application that is highly parallel. We have studied several parallelisation schemes of this application and we could establish different optimisation techniques by adapting the software to the specific SIMD units of the Cell processor. We have also developped a library named CELL MPI that allows efficient communication and synchronisation over the processing elements, using a simplified and implicit programming interface. This work allowed us to develop a methodology that simplifies the design of a parallel algorithm on the Cell processor.We have designed a parallel programming tool named SKELL BE which is based on algorithmic skeletons. This programming model providesan original solution of a meta-programming based code generator. Using SKELL BE, we can obtain very high performances applications that uses the Cell architecture efficiently when compared to other tools that exist on the market. Programmation parallèle Processeur CELL Traitement d’images Squelettes algorithmiques Calcul hautes performances Méta-programmation Processeur embarqué Parallel programming CELL processor Image processing Algorithmic skeletons High performance computing Meta-programming Embedded processors
8	A rapid design methodology for generating of parallel image processing applications and parallel architectures for smart camera / Méthodologie de prototypage rapide pour générer des applications de traitement d'images parallèles et architectures parallèles dédié caméra intelligente Chenini, Hanen 27 May 2014 (has links) Dû à la complexité des algorithmes de traitement d’images récents et dans le but d'accélérer la procédure de la conception des MPSoCs, méthodologies de prototypage rapide sont nécessaires pour fournir différents choix pour le programmeur de générer des programmes parallèles efficaces. Ce manuscrit présente les travaux menés pour proposer une méthodologie de prototypage rapide permettant la conception des architectures MPSOC ainsi que la génération automatique de système matériel / logiciel dédié un circuit reprogrammable (FPGA). Pour faciliter la programmation parallèle, l'approche MPSoC proposée est basée sur l’utilisation de Framework « CubeGen » qui permet la génération des différentes solutions envisageables pour réaliser des prototypes dans le domaine du traitement d’image. Ce document décrit une méthode basée sur le concept des squelettes générés en fonction des caractéristiques d'application afin d'exploiter tous les types de parallélisme des algorithmes réels. Un ensemble d’expérimentations utilisant des algorithmes courants permet d’évaluer les performances du flot de conception proposé équivalente à une architecture basé des processeurs hardcore et les solutions traditionnels basé sur cibles ASIC. / Due to the complexity of image processing algorithms and the restrictions imposed by MPSoC designs to reach their full potentials, automatic design methodologies are needed to provide guidance for the programmer to generate efficient parallel programs. In this dissertation, we present a MPSoC-based design methodology solution supporting automatic design space exploration, automatic performance evaluation, as well as automatic hardware/software system generation. To facilitate the parallel programming, the presented MPSoC approach is based on a CubeGen framework that permits the expression of different scenarios for architecture and algorithmic design exploring to reach the desired level of performance, resulting in short time development. The generated design could be implemented in a FPGA technology with an expected improvement in application performance and power consumption. Starting from the application, we have evolved our effective methodology to provide several parameterizable algorithmic skeletons in the face of varying application characteristics to exploit all types of parallelism of the real algorithms. Implementing such applications on our parallel embedded system shows that our advanced methods achieve increased efficiency with respect to the computational and communication requirements. The experimental results demonstrate that the designed multiprocessing architecture can be programmed efficiently and also can have an equivalent performance to a more powerful designs based hard-core processors and better than traditional ASIC solutions which are too slow and too expensive. Algorithmes de traitement d’images Conception des MPSoCs Prototypage rapide Programmes parallèles efficaces La génération automatique FPGA CubeGen Squelettes Types de parallélisme ASIC Image processing algorithms MPSoC designs Automatic design methodologies Efficient parallel programs CubeGen framework Parameterizable algorithmic skeletons All types of parallelism Traditional ASIC solutions
9	Squelettes algorithmiques pour la programmation et l'exécution efficaces de codes parallèles / Algorithmic skeletons for efficient programming and execution of parallel codes Legaux, Joeffrey 13 December 2013 (has links) Les architectures parallèles sont désormais présentes dans tous les matériels informatiques, mais les programmeurs ne sont généralement pas formés à leur programmation dans les modèles explicites tels que MPI ou les Pthreads. Il y a un besoin important de modèles plus abstraits tels que les squelettes algorithmiques qui sont une approche structurée. Ceux-ci peuvent être vus comme des fonctions d’ordre supérieur synthétisant le comportement d’algorithmes parallèles récurrents que le développeur peut ensuite combiner pour créer ses programmes. Les développeurs souhaitent obtenir de meilleures performances grâce aux programmes parallèles, mais le temps de développement est également un facteur très important. Les approches par squelettes algorithmiques fournissent des résultats intéressants dans ces deux aspects. La bibliothèque Orléans Skeleton Library ou OSL fournit un ensemble de squelettes algorithmiques de parallélisme de données quasi-synchrones dans le langage C++ et utilise des techniques de programmation avancées pour atteindre une bonne efficacité. Nous avons amélioré OSL afin de lui apporter de meilleures performances et une plus grande expressivité. Nous avons voulu analyser le rapport entre les performances des programmes et l’effort de programmation nécessaire sur OSL et d’autres modèles de programmation parallèle. La comparaison rigoureuse entre des programmes parallèles dans OSL et leurs équivalents de bas niveau montre une bien meilleure productivité pour les modèles de haut niveau qui offrent une grande facilité d’utilisation tout en produisant des performances acceptables. / Parallel architectures have now reached every computing device, but software developers generally lackthe skills to program them through explicit models such as MPI or the Pthreads. There is a need for moreabstract models such as the algorithmic skeletons which are a structured approach. They can be viewed ashigher order functions that represent the behaviour of common parallel algorithms, and those are combinedby the programmer to generate parallel programs. Programmers want to obtain better performances through the usage of parallelism, but the development time implied is also an important factor. Algorithmic skeletons provide interesting results in both those fields. The Orléans Skeleton Library or OSL provides a set of algorithmic skeletons for data parallelism within the bulk synchronous parallel model for the C++ language. It uses advanced metaprogramming techniques to obtain good performances. We improved OSL in order to obtain better performances from its generated programs, and extended its expressivity. We wanted to analyze the ratio between the performance of programs and the development effort needed within OSL and other parallel programming models. The comparison between parallel programs written within OSL and their equivalents in low level parallel models shows a better productivity for high level models : they are easy to use for the programmers while providing decent performances. Modèles de haut niveau, Squelettes algorithmiques Parallélisme quasi-synchrone Effort de programmation Expressivité Exceptions Distribution des données Homomorphismes High level programming models Algorithmic skeletons Bulk synchronous parallelism Development effort Expressivity Exceptions Data distribution Homomorphisms

Search results