1 |
CellPilot: An extension of the Pilot library for Cell Broadband Engine processors and heterogeneous clustersGirard, Natalie 13 January 2012 (has links)
The CellPilot library provides a uniform communication programming model, based on Pilot's process/channel approach, for clusters of Cell Broadband Engine processors. Pilot, a thin layer on top of the Message Passing Interface (MPI) library, allows processes to read/write messages on channels defined between pairs of processes on the cluster, but Pilot alone does not help a Cell programmer cope with the considerable complexities of intra-Cell communication.
With CellPilot, programmers still design software in terms of processes, but they can now be located on a Cell node's Power Processor Elements (PPEs), Synergistic Processing Elements (SPEs), or non-Cell node within a heterogeneous Cell cluster, and communication is accomplished via channels between process pairs. Programs are coded in terms of reading and writing on those channels, whereupon CellPilot transparently applies whichever communication mechanisms are required to transport the message, regardless of its endpoints. This gives the programmer a way to handle inter-process communication while avoiding low-level I/O operations and the use of multiple libraries.
|
2 |
Dynamic Configuration of a Relocatable Driver and Code Generator for Continuous Deep Analytics / Dynamisk Konfigurering av en Omlokaliseringsbar Driver och Kod Genererare för Continuous Deep AnalyticsBjuhr, Oscar January 2018 (has links)
Modern stream processing engines usually use the Java virtual machine (JVM) as execution platform. The JVM increases portability and safety of applications at the cost of not fully utilising the performance of the physical machines. Being able to use hardware accelerators such as GPUs for computationally heavy analysis of data streams is also restricted when using the JVM. The project Continuous Deep Analytics (CDA) explores the possibility of a stream processor executing native code directly on the underlying hardware using Rust. Rust is a young programming language which can statically guarantee the absence of memory errors and data races in programs without incurring performance penalties during runtime. Rust is built on top of LLVM which gives Rust a theoretical possibility to compile to a large set of target platforms. Each specific target platform does however require a specific configured runtime environment for Rust’s compiler to work properly. The CDA compiler will run in a distributed setting where the compiler has to be able to reallocate to different nodes to handle node failures. Setting up a reassignable Rust compiler in such a setting can be error prone and Docker is explored as a solution to this problem. A concurrent thread based system is implemented in Scala for building Docker images and compiling Rust in containers. Docker shows a potential of enabling easy reallocation of the driver without manual configuration. Docker has no major effect on Rust’s compile time. The large Docker images required to compile Rust is a drawback of the solution. They will require substantial network traffic to reallocate the driver. Reducing the size of the images would therefore make the solution more responsive. / Moderna strömprocessorer använder vanligtvis Javas virtuella maskin (JVM) som plattform för exekvering. Det gör strömprocessorerna portabla och säkra men begränsar hur väl de kan använda kapaciteten i den underliggande fysiska maskinen. Att kunna använda sig av hårdvaruaccelerator som t.ex. grafikkort för tung beräkning och analys av dataströmmar är en anledning till varför projektet Continuous Deep Analytics (CDA) utforskar möjligheten att istället exekvera en strömprocessor direkt i den underliggande maskinen. Rust är ett ungt programmeringsspråk som statiskt kan garantera att program inte innehåller minnesfel eller race conditions", detta utan att negativt påverka prestanda vid exekvering. Rust är byggt på LLVM vilket ger Rust en teoretisk möjlighet att kompilera till en stor mängd olika maskinarkitekturer. Varje specifik maskinarkitektur kräver dock att kompileringsmiljön är konfigurerad på ett specifikt sätt. CDAs kompilator kommer befinna sig i ett distribuerat system där kompilatorn kan bli flyttad till olika maskiner för att kunna hantera maskinfel. Att dynamiskt konfigurera kompilatorn i en sådan miljö kan leda till problem och därför testas Docker som en lösning på problemet. Ett trådbaserat system för parallell exekvering är implementerat i Scala för att bygga Docker bilder och kompilera Rust i containrar. Docker visar sig att ha en potential för att möjliggöra lätt omallokering av drivern utan manuell konfiguration. Docker har ingen stor påverkan på Rusts kompileringstid. De stora storlekarna på de Docker bilder som krävs för att kompilera Rust är en nackdel med lösningen. De gör att om allokering av drivern kräver mycket nätverkstrafik och kan därför ta lång tid. För att göra lösningen kvickare kan storleken av bilderna reduceras.
|
3 |
Modèles de programmation des applications de traitement du signal et de l'image sur cluster parallèle et hétérogène / Programming models for signal and image processing on parallel and heterogeneous architecturesMansouri, Farouk 14 October 2015 (has links)
Depuis une dizaine d'année, l'évolution des machines de calcul tend vers des architectures parallèles et hétérogènes. Composées de plusieurs nœuds connectés via un réseau incluant chacun des unités de traitement hétérogènes, ces grilles offrent de grandes performances. Pour programmer ces architectures, l'utilisateur doit s'appuyer sur des modèles de programmation comme MPI, OpenMP, CUDA. Toutefois, il est toujours difficile d'obtenir à la fois une bonne productivité du programmeur, qui passe par une abstraction des spécificités de l'architecture et performances. Dans cette thèse, nous proposons d'exploiter l'idée qu'un modèle de programmation spécifique à un domaine applicatif particulier permet de concilier ces deux objectifs antagonistes. En effet, en caractérisant une famille d'applications, il est possible d'identifier des abstractions de haut niveau permettant de les modéliser. Nous proposons deux modèles spécifiques au traitement du signal et de l'image sur cluster hétérogène. Le premier modèle est statique. Nous lui apportons une fonctionnalité de migration de tâches. Le second est dynamique, basé sur le support exécutif StarPU. Les deux modèles offrent d'une part un haut niveau d'abstraction en modélisant les applications de traitement du signal et de l'image sous forme de graphe de flot de données et d'autre part, ils permettent d'exploiter efficacement les différents niveaux de parallélisme tâche, données, graphe. Ces deux modèles sont validés par plusieurs implémentations et comparaisons incluant deux applications de traitement de l'image du monde réel sur cluster CPU-GPU. / Since a decade, computing systems evolved to parallel and heterogeneous architectures. Composed of several nodes connected via a network and including heterogeneous processing units, clusters achieve high performances. To program these architectures, the user must rely on programming models such as MPI, OpenMP or CUDA. However, it is still difficult to conciliate productivity provided by abstracting the architectural specificities, and performances. In this thesis, we exploit the idea that a programming model specific to a particular domain of application can achieve these antagonist goals. In fact, by characterizing a family of application, it is possible to identify high level abstractions to efficiently model them. We propose two models specific to the implementation of signal and image processing applications on heterogeneous clusters. The first model is static. We enrich it with a task migration feature. The second model is dynamic, based on the StarPU runtime. Both models offer firstly a high level of abstraction by modeling image and signal applications as a data flow graph and secondly they efficiently exploit task, data and graph parallelisms. We validate these models with different implementations and comparisons including two real-world applications of images processing on a CPU-GPU cluster.
|
Page generated in 0.1131 seconds