21 |
Adaptive work placement for query processing on heterogeneous computing resourcesKarnagel, Thomas, Habich, Dirk, Wolfgang 10 November 2022 (has links)
The hardware landscape is currently changing from homogeneous multi-core systems towards heterogeneous systems with many di↵erent computing units, each with their own characteristics. This trend is a great opportunity for database systems to increase the overall performance if the heterogeneous resources can be utilized eciently. To achieve this, the main challenge is to place the right work on the right computing unit. Current approaches tackling this placement for query processing assume that data cardinalities of intermediate results can be correctly estimated. However, this assumption does not hold for complex queries. To overcome this problem, we propose an adaptive placement approach being independent of cardinality estimation of intermediate results. Our approach is incorporated in a novel adaptive placement sequence. Additionally, we implement our approach as an extensible virtualization layer, to demonstrate the broad applicability with multiple database systems. In our evaluation, we clearly show that our approach significantly improves OLAP query processing on heterogeneous hardware, while being adaptive enough to react to changing cardinalities of intermediate query results.
|
22 |
Scheduling Tasks on Heterogeneous Chip Multiprocessors with Reconfigurable HardwareTeller, Justin Stevenson 31 July 2008 (has links)
No description available.
|
23 |
HDArray: PARALLEL ARRAY INTERFACE FOR DISTRIBUTED HETEROGENEOUS DEVICESHyun Dok Cho (18620491) 30 May 2024 (has links)
<p dir="ltr">Heterogeneous clusters with nodes containing one or more accelerators, such as GPUs, have become common. While MPI provides inter-address space communication, and OpenCL provides a process with access to heterogeneous computational resources, programmers are forced to write hybrid programs that manage the interaction of both of these systems. This paper describes an array programming interface that provides users with automatic and manual distributions of data and work. Using work distribution and kernel def and use information, communication among processes and devices in a process is performed automatically. By providing a unified programming model to the user, program development is simplified.</p>
|
24 |
Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDAPrades Gasulla, Javier 14 June 2021 (has links)
Tesis por compendio / [ES] En la última década la utilización de la GPGPU (General Purpose computing in Graphics Processing Units; Computación de Propósito General en Unidades de Procesamiento Gráfico) se ha vuelto tremendamente popular en los centros de datos de todo el mundo. Las GPUs (Graphics Processing Units; Unidades de Procesamiento Gráfico) se han establecido como elementos aceleradores de cómputo que son usados junto a las CPUs formando sistemas heterogéneos. La naturaleza masivamente paralela de las GPUs, destinadas tradicionalmente al cómputo de gráficos, permite realizar operaciones numéricas con matrices de datos a gran velocidad debido al gran número de núcleos que integran y al gran ancho de banda de acceso a memoria que poseen. En consecuencia, aplicaciones de todo tipo de campos, tales como química, física, ingeniería, inteligencia artificial, ciencia de materiales, etc. que presentan este tipo de patrones de cómputo se ven beneficiadas, reduciendo drásticamente su tiempo de ejecución.
En general, el uso de la aceleración del cómputo en GPUs ha significado un paso adelante y una revolución. Sin embargo, no está exento de problemas, tales como problemas de eficiencia energética, baja utilización de las GPUs, altos costes de adquisición y mantenimiento, etc.
En esta tesis pretendemos analizar las principales carencias que presentan estos sistemas heterogéneos y proponer soluciones basadas en el uso de la virtualización remota de GPUs. Para ello hemos utilizado la herramienta rCUDA, desarrollada en la Universitat Politècnica de València, ya que multitud de publicaciones la avalan como el framework de virtualización remota de GPUs más avanzado de la actualidad.
Los resutados obtenidos en esta tesis muestran que el uso de rCUDA en entornos de Cloud Computing incrementa el grado de libertad del sistema, ya que permite crear instancias virtuales de las GPUs físicas totalmente a medida de las necesidades de cada una de las máquinas virtuales. En entornos HPC (High Performance Computing; Computación de Altas Prestaciones), rCUDA también proporciona un mayor grado de flexibilidad de uso de las GPUs de todo el clúster de cómputo, ya que permite desacoplar totalmente la parte CPU de la parte GPU de las aplicaciones. Además, las GPUs pueden estar en cualquier nodo del clúster, independientemente del nodo en el que se está ejecutando la parte CPU de la aplicación. En general, tanto para Cloud Computing como en el caso de HPC, este mayor grado de flexibilidad se traduce en un aumento hasta 2x de la productividad de todo el sistema al mismo tiempo que se reduce el consumo energético en un 15%.
Finalmente, también hemos desarrollado un mecanismo de migración de trabajos de la parte GPU de las aplicaciones que ha sido integrado dentro del framework rCUDA. Este mecanismo de migración ha sido evaluado y los resultados muestran claramente que, a cambio de una pequeña sobrecarga, alrededor de 400 milisegundos, en el tiempo de ejecución de las aplicaciones, es una potente herramienta con la que, de nuevo, aumentar la productividad y reducir el gasto energético del sistema.
En resumen, en esta tesis se analizan los principales problemas derivados del uso de las GPUs como aceleradores de cómputo, tanto en entornos HPC como de Cloud Computing, y se demuestra cómo a través del uso del framework rCUDA, estos problemas pueden solucionarse. Además se desarrolla un potente mecanismo de migración de trabajos GPU, que integrado dentro del framework rCUDA, se convierte en una herramienta clave para los futuros planificadores de trabajos en clusters heterogéneos. / [CA] En l'última dècada la utilització de la GPGPU(General Purpose computing in Graphics Processing Units; Computació de Propòsit General en Unitats de Processament Gràfic) s'ha tornat extremadament popular en els centres de dades de tot el món. Les GPUs (Graphics Processing Units; Unitats de Processament Gràfic) s'han establert com a elements acceleradors de còmput que s'utilitzen al costat de les CPUs formant sistemes heterogenis. La naturalesa massivament paral·lela de les GPUs, destinades tradicionalment al còmput de gràfics, permet realitzar operacions numèriques amb matrius de dades a gran velocitat degut al gran nombre de nuclis que integren i al gran ample de banda d'accés a memòria que posseeixen. En conseqüència, les aplicacions de tot tipus de camps, com ara química, física, enginyeria, intel·ligència artificial, ciència de materials, etc. que presenten aquest tipus de patrons de còmput es veuen beneficiades reduint dràsticament el seu temps d'execució.
En general, l'ús de l'acceleració del còmput en GPUs ha significat un pas endavant i una revolució, però no està exempt de problemes, com ara poden ser problemes d'eficiència energètica, baixa utilització de les GPUs, alts costos d'adquisició i manteniment, etc.
En aquesta tesi pretenem analitzar les principals mancances que presenten aquests sistemes heterogenis i proposar solucions basades en l'ús de la virtualització remota de GPUs. Per a això hem utilitzat l'eina rCUDA, desenvolupada a la Universitat Politècnica de València, ja que multitud de publicacions l'avalen com el framework de virtualització remota de GPUs més avançat de l'actualitat.
Els resultats obtinguts en aquesta tesi mostren que l'ús de rCUDA en entorns de Cloud Computing incrementa el grau de llibertat del sistema, ja que permet crear instàncies virtuals de les GPUs físiques totalment a mida de les necessitats de cadascuna de les màquines virtuals. En entorns HPC (High Performance Computing; Computació d'Altes Prestacions), rCUDA també proporciona un major grau de flexibilitat en l'ús de les GPUs de tot el clúster de còmput, ja que permet desacoblar totalment la part CPU de la part GPU de les aplicacions. A més, les GPUs poden estar en qualsevol node del clúster, sense importar el node en el qual s'està executant la part CPU de l'aplicació. En general, tant per a Cloud Computing com en el cas del HPC, aquest major grau de flexibilitat es tradueix en un augment fins 2x de la productivitat de tot el sistema al mateix temps que es redueix el consum energètic en aproximadament un 15%.
Finalment, també hem desenvolupat un mecanisme de migració de treballs de la part GPU de les aplicacions que ha estat integrat dins del framework rCUDA. Aquest mecanisme de migració ha estat avaluat i els resultats mostren clarament que, a canvi d'una petita sobrecàrrega, al voltant de 400 mil·lisegons, en el temps d'execució de les aplicacions, és una potent eina amb la qual, de nou, augmentar la productivitat i reduir la despesa energètica de sistema.
En resum, en aquesta tesi s'analitzen els principals problemes derivats de l'ús de les GPUs com acceleradors de còmput, tant en entorns HPC com de Cloud Computing, i es demostra com a través de l'ús del framework rCUDA, aquests problemes poden solucionar-se. A més es desenvolupa un potent mecanisme de migració de treballs GPU, que integrat dins del framework rCUDA, esdevé una eina clau per als futurs planificadors de treballs en clústers heterogenis. / [EN] In the last decade the use of GPGPU (General Purpose computing in Graphics Processing Units) has become extremely popular in data centers around the world. GPUs (Graphics Processing Units) have been established as computational accelerators that are used alongside CPUs to form heterogeneous systems. The massively parallel nature of GPUs, traditionally intended for graphics computing, allows to perform numerical operations with data arrays at high speed. This is achieved thanks to the large number of cores GPUs integrate and the large bandwidth of memory access. Consequently, applications of all kinds of fields, such as chemistry, physics, engineering, artificial intelligence, materials science, and so on, presenting this type of computational patterns are benefited by drastically reducing their execution time.
In general, the use of computing acceleration provided by GPUs has meant a step forward and a revolution, but it is not without problems, such as energy efficiency problems, low utilization of GPUs, high acquisition and maintenance costs, etc.
In this PhD thesis we aim to analyze the main shortcomings of these heterogeneous systems and propose solutions based on the use of remote GPU virtualization. To that end, we have used the rCUDA middleware, developed at Universitat Politècnica de València. Many publications support rCUDA as the most advanced remote GPU virtualization framework nowadays.
The results obtained in this PhD thesis show that the use of rCUDA in Cloud Computing environments increases the degree of freedom of the system, as it allows to create virtual instances of the physical GPUs fully tailored to the needs of each of the virtual machines. In HPC (High Performance Computing) environments, rCUDA also provides a greater degree of flexibility in the use of GPUs throughout the computing cluster, as it allows the CPU part to be completely decoupled from the GPU part of the applications. In addition, GPUs can be on any node in the cluster, regardless of the node on which the CPU part of the application is running. In general, both for Cloud Computing and in the case of HPC, this greater degree of flexibility translates into an up to 2x increase in system-wide throughput while reducing energy consumption by approximately 15%.
Finally, we have also developed a job migration mechanism for the GPU part of applications that has been integrated within the rCUDA middleware. This migration mechanism has been evaluated and the results clearly show that, in exchange for a small overhead of about 400 milliseconds in the execution time of the applications, it is a powerful tool with which, again, we can increase productivity and reduce energy foot print of the computing system.
In summary, this PhD thesis analyzes the main problems arising from the use of GPUs as computing accelerators, both in HPC and Cloud Computing environments, and demonstrates how thanks to the use of the rCUDA middleware these problems can be addressed. In addition, a powerful GPU job migration mechanism is being developed, which, integrated within the rCUDA framework, becomes a key tool for future job schedulers in heterogeneous clusters. / This work jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants (20524/PDC/18, 20813/PI/18 and 20988/PI/18) and by the Spanish MEC and European Commission FEDER under grants TIN2015-66972-C5-3-R, TIN2016-78799-P and CTQ2017-87974-R (AEI/FEDER, UE). We also thank NVIDIA for hardware donation under GPU Educational Center 2014-2016 and Research Center 2015-2016. The authors thankfully acknowledge the computer resources at CTE-POWER and the technical support provided by Barcelona Supercomputing Center - Centro Nacional de Supercomputación (RES-BCV-2018-3-0008). Furthermore, researchers from Universitat Politècnica de València are supported by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc. Prof. Pradipta Purkayastha, from Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, is acknowledged for kindly providing the initial ligand and DNA structures. / Prades Gasulla, J. (2021). Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/168081 / Compendio
|
25 |
Caractérisation expérimentale des processus d’hydratation et de carbonatation des roches basiques et ultra-basiques / Experimental caracterisation of hydratation and carbonatation processes of mafic and ultramafic systemsPeuble, Steve 27 June 2014 (has links)
Depuis le milieu des années 90, la minéralisation in situ du CO2 est envisagée comme une solution durable et efficace pour limiter ses émissions anthropiques vers l'atmosphère. Il s'agit de récupérer le CO2 émis par certaines industries pour le piéger en profondeur sous forme minérale (carbonates) dans les aquifères mafiques et ultramafiques naturels (de type basaltes et péridotites). La carbonatation du CO2 a été largement décrite dans les systèmes naturels où elle apparait à travers une série de réactions chimiques couplées au transport des espèces réactives dans le fluide. Plusieurs expériences en réacteurs fermés ont été menées depuis une quinzaine d'années afin de mieux comprendre les paramètres physico-chimiques contrôlant ces réactions. Mais très peu d'études n'ont encore caractérisé les processus de transport-réactif au cours de l'injection et de la minéralisation in situ du CO2 dans ces roches.Ces travaux visent à répondre à 3 principaux objectifs : (i) caractériser l'évolution des chemins réactifs lors de l'injection de CO2 dans des roches (ultra-)mafiques, (ii) mesurer les effets en retour des réactions sur les propriétés hydrodynamiques du milieu et (iii) quantifier le rendement et la pérennité des processus sur le long terme. Ils s'appuient sur le développement de protocoles expérimentaux pour (i) reproduire l'injection de CO2 dans les roches (ultra-)mafiques et (ii) caractériser les réactions à l'aide d'une série d'outils géochimiques et analytiques de l'échelle atomique à centimétrique. Trois séries d'expériences de percolation réactive ont été réalisé sur des agrégats (ultra-)mafiques relativement simples (olivines de San Carlos et d'Hawaii) et plus complexes (basaltes de Stapafell) dans des conditions de P-T-confinement in situ (Ptot=10-25 MPa ; T=180-185°C;Pconf=15-28 MPa).Les résultats obtenus ont permis de différencier plusieurs chemins réactifs dans ces systèmes en fonction du transport du fluide, de la porosité du milieu, des hétérogénéités locales de la roche, de la minéralogie et/ou des variations locales de la composition chimique du fluide. Les calculs du bilan de masse ont révélé une minéralisation efficace du CO2 contrôlée par les propriétés chimiques et hydrodynamiques du milieu. Mais certaines réactions associées à l'altération des roches (ultra-)mafiques (hydratation) ont des effets en retour négatifs sur les propriétés réservoirs de la roche (porosité, perméabilité) pouvant compromettre la pérennité du stockage du CO2 dans les aquifères naturels sur le long terme.Ces nouvelles données permettront aux modèles numériques de mieux simuler la carbonatation des roches (ultra-)mafiques en connaissant les propriétés hydrodynamiques du milieu et les hétérogénéités structurales du réservoir. Elles suggèrent aussi qu'un meilleur contrôle de certains paramètres d'injection, comme le débit ou la composition du fluide injecté (ex: pCO2), permettrait d'améliorer le taux et le rendement de la carbonatation. / Since the mid-90s, in situ mineralization of CO2 has been considered as a safe and efficient solution to mitigate its anthropogenic emissions to the atmosphere. It is to recover the CO2 emitted by some industries and trap it in the mineral form (carbonates) in mafic and ultramafic aquifers (e.g. basalts and peridotites). The carbonation of CO2 has been widely described in natural systems where it occurs through a series of complex chemical reactions coupled to the transport of reactive species in the fluid. Numerous experiments have been conducted in batch reactors over the past fifteen years to better understand the physico-chemical parameters controlling the carbonation of (ultra-)mafic rocks. But few studies have further characterized the coupling reactive-transport processes during the injection and in situ mineralization of CO2 in these rocks.This work aims to meet 3 main objectives: (i) characterize changes in reaction paths during the injection of CO2 in (ultra-)mafic systems, (ii) measure the feedbacks effects of chemical reactions on the hydrodynamic rock properties and (iii) quantify the efficiency and sustainability of such processes over long time periods. It is based on the development of experimental protocols to (i) reproduce the injection of CO2 into (ultra-)mafic rocks and (ii) characterize the reactions using a series of geochemical and analytical tools from the atomic to the centimetric scale. Three series of reactive percolation experiments have been performed on (ultra-)mafic aggregates from relatively simple (olivines from San Carlos and Hawaii) to more complex samples (basalts from Stapafell) under in situ P-T-containment conditions (Ptot=10-25 MPa; T=180-185°C; Pcont=15-28 MPa).The results allowed us to differentiate several reactions paths in these systems depending on the fluid transport, rock porosity, local hydrodynamic properties, mineralogy and/or local changes in the fluid composition. Mass balance calculations have revealed an efficient mineralization of CO2 in the samples. It is controlled by the chemical and the hydrodynamic properties of the rock at the pore scale. But some reactions associated with the alteration of (ultra-)mafic rocks (e.g. hydration) have negative feedbacks effects on the reservoir rock properties (porosity and permeability) that may compromise the sustainability of CO2 storage in natural aquifers in the long term.These new supporting data will allow numerical models to better simulate the carbonation of (ultra-)mafic rocks knowing the hydrodynamic properties and the structural heterogeneities of the reservoir. They also suggest that a better control of some injection parameters, such as the flow injection rate and the injected fluid composition (e.g. pCO2), would improve the rate and yield of CO2 mineralization in these systems.
|
26 |
Génération de modèles de haut niveau enrichis pour les systèmes hétérogènes et multiphysiques / Generating high level enriched models for heterogeneous and muliphysics systemsBousquet, Laurent 29 January 2014 (has links)
Les systèmes sur puce sont de plus en plus complexes : ils intègrent des parties numériques, desparties analogiques et des capteurs ou actionneurs. SystemC et son extension SystemC AMSpermettent aujourd’hui de modéliser à haut niveau d’abstraction de tels systèmes. Ces outilsconstituent de véritables atouts dans une optique d’étude de faisabilité, d’exploration architecturale etde vérification du fonctionnement global des systèmes complexes hétérogènes et multiphysiques. Eneffet, les durées de simulation deviennent trop importantes pour envisager les simulations globales àbas niveau d’abstraction. De plus, les simulations basées sur l’utilisation conjointe de différents outilsprovoquent des problèmes de synchronisation. Les modèles de bas niveau, une fois crées par lesspécialistes des différents domaines peuvent toutefois être abstraits afin de générer des modèles dehaut niveau simulables sous SystemC/SystemC AMS en des temps de simulation réduits. Une analysedes modèles de calcul et des styles de modélisation possibles est d’abord présentée afin d’établir unlien avec les durées de simulation, ceci pour proposer un style de modélisation en fonction du niveaud’abstraction souhaité et de l’ampleur de la simulation à effectuer. Dans le cas des circuits analogiqueslinéaires, une méthode permettant de générer automatiquement des modèles de haut niveaud’abstraction à partir de modèles de bas niveau a été proposée. Afin d’évaluer très tôt dans le flot deconception la consommation d’un système, un moyen d’enrichir les modèles de haut niveaupréalablement générés est présenté. L’attention a ensuite été portée sur la modélisation à haut niveaudes systèmes multiphysiques. Deux méthodes y sont discutées : la méthode consistant à utiliser lecircuit équivalent électrique puis la méthode basée sur les bond graphs. En particulier, nous proposonsune méthode permettant de générer un modèle équivalent au bond graph à partir d’un modèle de basniveau. Enfin, la modélisation d’un système éolien est étudiée afin d’illustrer les différents conceptsprésentés dans cette thèse. / Systems on chip are more and more complex as they now embed not only digital and analog parts, butalso sensors and actuators. SystemC and its extension SystemC AMS allow the high level modeling ofsuch systems. These tools are efficient for feasibility study, architectural exploration and globalverification of heterogeneous and multiphysics systems. At low level of abstraction, the simulationdurations are too important. Moreover, synchronization problems appear when cosimulations areperformed. It is possible to abstract the low level models that are developed by the specialists of thedifferent domains to create high level models that can be simulated faster using SystemC/SystemCAMS. The models of computation and the modeling styles have been studied. A relation is shownbetween the modeling style, the model size and the simulation speed. A method that generatesautomatically the high level model of an analog linear circuit from its low level representation isproposed. Then, it is shown how to include in the high level model some information allowing thepower consumption estimation. After that, the multiphysics systems modeling is studied. Twomethods are discussed: firstly, the one that uses the electrical equivalent circuit, then the one based onthe bond graph approach. It is shown how to generate a bond graph equivalent model from a low levelrepresentation. Finally, the modeling of a wind turbine system is discussed in order to illustrate thedifferent concepts presented in this thesis.
|
27 |
Optimization of Product Placement and Pickup in Automated WarehousesAbeer Abdelhadi (9047177) 24 July 2020 (has links)
<div>Smart warehouses have become more popular in these days, with Automated Guided Vehicles (AGVs) being used for order pickups. They also allow efficient cost management with optimized storage and retrieval. Moreover, optimization of resources in these warehouses is essential to ensure maximum efficiency. In this thesis, we consider a three dimensional smart warehouse system equipped with heterogeneous AGVs (i.e., having different speeds). We propose scheduling and placement policies that jointly consider all the different design parameters including the scheduling decision probabilities and storage assignment locations. In order to provide differentiated service levels, we propose a prioritized probabilistic scheduling and placement policy to minimize a weighted sum of mean latency and latency tail probability (LTP). Towards this goal, we first derive closed-form expressions for the mean latency and LTP. Then, we formulate an optimization problem to jointly optimize a weighted sum of both the mean latency and LTP. The optimization problem is solved efficiently over the scheduling and decision variables. For a given placement of the products, scheduling decisions of customers’ orders are solved optimally and derived in closed forms. Evaluation results demonstrate a significant improvement of our policy (up to 32%) as compared to the state of other algorithms, such as the Least Work Left policy and Join the Shortest Queue policy, and other competitive baselines.</div>
|
28 |
Lastgetriebene Validierung Dienstbereitstellender SystemeCaspar, Mirko 07 January 2014 (has links)
Mit steigender Komplexität heterogener, verteilter Systeme nehmen auch die Anforderungen an deren Validierung zu.
In dieser Arbeit wird ein Konzept vorgestellt, mit dem eine bestimmte Klasse komplexer Systeme, so genannte Dienstbereitstellende Systeme, durch automatisiertes Testen validiert werden kann. Mit Hilfe heterogener Klienten, bspw. eingebetteter Systeme, wird die Systemfunktionalität getestet. Hierzu wird das zu testende System auf die nach außen zur Verfügung gestellten Dienste reduziert und die Nutzung dieser Dienste durch Klienten mit einer Last quantifiziert. Eine Validierung wird durch die Vorgabe zeitlich veränderlicher Lasten für jeden Dienst definiert. Diese Lasten werden zielgerichtet den verfügbaren Klienten zugeteilt und durch diese im zu testenden System erzeugt.
Zur praktikablen Anwendung dieses Konzeptes ist eine Automatisierung des Validierungsprozesses notwendig. In der Arbeit wird die Architektur einer Testbench vorgestellt, die zum einen die Heterogenität der Klienten berücksichtigt und zum anderen Einflüsse durch die Dynamik der Klienten während der Laufzeit der Validierung ausgleicht. Das hierbei zu lösende algorithmische Problem der Dynamischen Testpartitionierung wird ebenso definiert, wie ein Modell zur Beschreibung aller notwendigen Parameter. Die Testpartitionierung kann mittels einer eigens entwickelten Heuristik in Polynomialzeit gelöst werden.
Zur Bestimmung der Leistungsfähigkeit des entwickelten Verfahrens wird die Heuristik aufwendigen Untersuchungen unterzogen. Am Beispiel eines zu testenden Mobilfunknetzwerkes wird die beschriebene Testbench umgesetzt und Kernparameter mittels Simulation ermittelt.
Das Ergebnis dieser Arbeit ist ein Konzept zur Systemvalidierung, das generisch auf jede Art von dienstbereitstellenden Systemen angewandt werden kann und damit zur Verbesserung des Entwicklungsprozesses von komplexen verteilten Systemen beiträgt.
|
29 |
An I/O-aware scheduler for containerized data-intensive HPC tasks in Kubernetes-based heterogeneous clusters / En I/O-medveten schemaläggare för containeriserade dataintensiva HPC-uppgifter i Kubernetes-baserade heterogena klusterWu, Zheyun January 2022 (has links)
Cloud-native is a new computing paradigm that takes advantage of key characteristics of cloud computing, where applications are packaged as containers. The lifecycle of containerized applications is typically managed by container orchestration tools such as Kubernetes, the most popular container orchestration system that automates the containers’ deployment, maintenance, and scaling. Kubernetes has become the de facto standard for container orchestrators in the cloud-native era. Meanwhile, with the increasing demand for High-Performance Computing (HPC) over the past years, containerization is being adopted by the HPC community and various processors and special-purpose hardware are utilized to accelerate HPC applications. The architecture of cloud systems has been gradually shifting from homogeneous to heterogeneous with different processors and hardware accelerators, which raises a new challenge: how to exploit different computing resources efficiently? Much effort has been devoted to improving the use efficiency of computing resources in heterogeneous systems from the perspective of task scheduling, which aims to match different types of tasks to optimal computing devices for execution. Existing proposals do not take into account the variation in I/O performance between heterogeneous nodes when scheduling tasks. However, I/O performance is an important but often overlooked factor that can be a potential performance bottleneck for HPC tasks. This thesis proposes an I/O-aware scheduler named cmio-scheduler for containerized data-intensive HPC tasks in Kubernetes-based heterogeneous clusters, which is aware of the I/O throughput of compute nodes when making task placement decisions. In principle, cmio-scheduler assigns data-intensive HPC tasks to the node that fulfills the tasks’ requirements for CPU, memory, and GPU and has the highest I/O throughput. The experimental results demonstrate that cmio-scheduler reduces the execution time by 19.32% for the overall workflow and 15.125% for parallelizable tasks on average. / Cloud-native är ett nytt dataparadigm som drar nytta av de viktigaste egenskaperna hos molntjänster, där applikationer paketeras som behållare. Livscykeln för applikationer i containrar hanteras vanligtvis av verktyg för containerorkestrering, t.ex. Kubernetes, det mest populära systemet för containerorkestrering, som automatiserar installation, underhåll och skalning av containrar. Kubernetes har blivit de facto-standard för containerorkestrar i den molnnativa eran. Med den ökande efterfrågan på högpresterande beräkningar (HPC) under de senaste åren har containerisering antagits av HPC-samhället och olika processorer och specialhårdvara används för att påskynda HPC-tillämpningar. Arkitekturen för molnsystem har gradvis skiftat från homogen till heterogen med olika processorer och hårdvaruacceleratorer, vilket ger upphov till en ny utmaning: hur kan man utnyttja olika datorresurser på ett effektivt sätt? Mycket arbete har ägnats åt att förbättra utnyttjandet av datorresurser i heterogena system ur perspektivet för uppgiftsfördelning, som syftar till att matcha olika typer av uppgifter till optimala datorutrustning för utförande. Befintliga förslag tar inte hänsyn till variationen i I/O-prestanda mellan heterogena noder vid schemaläggning av uppgifter. I/O-prestanda är dock en viktig men ofta förbisedd faktor som kan vara en potentiell flaskhals för HPC-uppgifter. I den här avhandlingen föreslås en I/O-medveten schemaläggare vid namn cmio-scheduler för containeriserade dataintensiva HPC-uppdrag i Kubernetes-baserade heterogena kluster, som är medveten om beräkningsnodernas I/O-genomströmning när den fattar beslut om placering av uppdrag. I princip tilldelar cmio-scheduler dataintensiva HPC-uppgifter till den nod som uppfyller uppgifternas krav på CPU, minne och GPU och som har den högsta I/O-genomströmningen. De experimentella resultaten visar att cmio-scheduler i genomsnitt minskar exekveringstiden med 19,32 % för det totala arbetsflödet och med 15,125 % för parallelliserbara uppgifter.
|
30 |
A Hardware/Software Stack for Heterogeneous SystemsLehner, Wolfgang, Castrillon, Jeronimo, Lieber, Matthias, Klüppelholz, Sascha, Völp, Marcus, Asmussen, Nils, Aßmann, Uwe, Baader, Franz, Baier, Christel, Fettweis, Gerhard, Fröhlich, Jochen, Goens, Andrés, Haas, Sebastian, Habich, Dirk, Härtig, Hermann, Hasler, Mattis, Huismann, Immo, Karnagel, Tomas, Karol, Sven, Kumar, Akash, Leuschner, Linda, Ling, Siqi, Märcker, Steffen, Menard, Christian, Mey, Johannes, Nagel, Wolfgang, Nöthen, Benedikt, Peñaloza, Rafael, Raitza, Michael, Stiller, Jörg, Ungethüm, Annett, Voigt, Axel, Wunderlich, Sascha 17 July 2023 (has links)
Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.
|
Page generated in 0.1051 seconds