1 |
Branch and Bound Algorithm for Multiprocessor SchedulingRahman, Mostafizur January 2009 (has links)
The multiprocessor task graph scheduling problem has been extensively studied asacademic optimization problem which occurs in optimizing the execution time of parallelalgorithm with parallel computer. The problem is already being known as one of the NPhardproblems. There are many good approaches made with many optimizing algorithmto find out the optimum solution for this problem with less computational time. One ofthem is branch and bound algorithm.In this paper, we propose a branch and bound algorithm for the multiprocessor schedulingproblem. We investigate the algorithm by comparing two different lower bounds withtheir computational costs and the size of the pruned tree.Several experiments are made with small set of problems and results are compared indifferent sections.
|
2 |
Predictive Resource Management for Scientific WorkflowsWitt, Carl Philipp 21 July 2020 (has links)
Um Erkenntnisse aus großen Mengen wissenschaftlicher Rohdaten zu gewinnen, sind komplexe Datenanalysen erforderlich. Scientific Workflows sind ein Ansatz zur Umsetzung solcher Datenanalysen. Um Skalierbarkeit zu erreichen, setzen die meisten Workflow-Management-Systeme auf bereits existierende Lösungen zur Verwaltung verteilter Ressourcen, etwa Batch-Scheduling-Systeme. Die Abschätzung der Ressourcen, die zur Ausführung einzelner Arbeitsschritte benötigt werden, wird dabei immer noch an die Nutzer:innen delegiert. Dies schränkt die Leistung und Benutzerfreundlichkeit von Workflow-Management-Systemen ein, da den Nutzer:innen oft die Zeit, das Fachwissen oder die Anreize fehlen, den Ressourcenverbrauch genau abzuschätzen.
Diese Arbeit untersucht, wie die Ressourcennutzung während der Ausführung von Workflows automatisch erlernt werden kann. Im Gegensatz zu früheren Arbeiten werden Scheduling und Vorhersage von Ressourcenverbrauch in einem engeren Zusammenhang betrachtet. Dies bringt verschiedene Herausforderungen mit sich, wie die Quantifizierung der Auswirkungen von Vorhersagefehlern auf die Systemleistung.
Die wichtigsten Beiträge dieser Arbeit sind:
1. Eine Literaturübersicht aktueller Ansätze zur Vorhersage von Spitzenspeicherverbrauch mittels maschinellen Lernens im Kontext von Batch-Scheduling-Systemen.
2. Ein Scheduling-Verfahren, das statistische Methoden verwendet, um vorherzusagen, welche Scheduling-Entscheidungen verbessert werden können.
3. Ein Ansatz zur Nutzung von zur Laufzeit gemessenem Spitzenspeicherverbrauch in Vorhersagemodellen, die die fortwährende Optimierung der Ressourcenallokation erlauben. Umfangreiche Simulationsexperimente geben Einblicke in Schlüsseleigenschaften von Scheduling-Heuristiken und Vorhersagemodellen.
4. Ein Vorhersagemodell, das die asymmetrischen Kosten überschätzten und unterschätzten Speicherverbrauchs berücksichtigt, sowie die Folgekosten von Vorhersagefehlern einbezieht. / Scientific experiments produce data at unprecedented volumes and resolutions. For the extraction of insights from large sets of raw data, complex analysis workflows are necessary. Scientific workflows enable such data analyses at scale. To achieve scalability, most workflow management systems are designed as an additional layer on top of distributed resource managers, such as batch schedulers or distributed data processing frameworks. However, like distributed resource managers, they do not automatically determine the amount of resources required for executing individual tasks in a workflow. The status quo is that workflow management systems delegate the challenge of estimating resource usage to the user. This limits the performance and ease-of-use of scientific workflow management systems, as users often lack the time, expertise, or incentives to estimate resource usage accurately.
This thesis is an investigation of how to learn and predict resource usage during workflow execution. In contrast to prior work, an integrated perspective on prediction and scheduling is taken, which introduces various challenges, such as quantifying the effects of prediction errors on system performance.
The main contributions are:
1. A survey of peak memory usage prediction in batch processing environments. It provides an overview of prior machine learning approaches, commonly used features, evaluation metrics, and data sets.
2. A static workflow scheduling method that uses statistical methods to predict which scheduling decisions can be improved.
3. A feedback-based approach to scheduling and predictive resource allocation, which is extensively evaluated using simulation. The results provide insights into the desirable characteristics of scheduling heuristics and prediction models.
4. A prediction model that reduces memory wastage. The design takes into account the asymmetric costs of overestimation and underestimation, as well as follow up costs of prediction errors.
|
3 |
Collecting and representing parallel programs with high performance instrumentationRailing, Brian Paul 07 January 2016 (has links)
Computer architecture has looming challenges with finding program parallelism, process technology limits, and limited power budget. To navigate these challenges, a deeper understanding of parallel programs is required. I will discuss the task graph representation and how it enables programmers and compiler optimizations to understand and exploit dynamic aspects of the program.
I will present Contech, which is a high performance framework for generating dynamic task graphs from arbitrary parallel programs. The Contech framework supports a variety of languages and parallelization libraries, and has been tested on both x86 and ARM. I will demonstrate how this framework encompasses a diversity of program analyses, particularly by modeling a dynamically reconfigurable, heterogeneous multi-core processor.
|
4 |
Data aggregation in sensor networksKallumadi, Surya Teja January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Gurdip Singh / Severe energy constraints and limited computing abilities of the nodes in a network present a major challenge in the design and deployment of a wireless sensor network. This thesis aims to present energy efficient algorithms for data fusion and information aggregation in a sensor network. The various methodologies of data fusion presented in this thesis intend to reduce the data traffic within a network by mapping the sensor network application task graph onto a sensor network topology. Partitioning of an application into sub-tasks that can be mapped onto the nodes of a sensor network offers opportunities to reduce the overall energy consumption of a sensor network. The first approach proposes a grid based coordinated incremental data fusion and routing with heterogeneous nodes of varied computational abilities. In this approach high performance nodes arranged in a mesh like structure spanning the network topology, are present amongst the resource constrained nodes. The sensor network protocol performance, measured in terms of hop-count is analysed for various grid sizes of the high performance nodes. To reduce network traffic and increase the energy efficiency in a randomly deployed sensor network, distributed clustering strategies which consider network density and structure similarity are applied on the network topology. The clustering methods aim to improve the energy efficiency of the sensor network by dividing the network into logical clusters and mapping the fusion points onto the clusters. Routing of network information is performed by inter-cluster and intra-cluster routing.
|
5 |
Ordonnancement de graphes de tâches sur des plates-formes de calcul modernes / Scheduling task graphs on modern computing platformsSimon, Bertrand 04 July 2018 (has links)
Cette thèse porte sur trois thématiques principales liées à l'ordonnancement de graphes de tâches sur des plates-formes de calcul modernes. Un graphe de tâches est une modélisation classique d'un programme à exécuter, par exemple une application de calcul scientifique. La décomposition d'une application en différentes tâches permet d'exploiter le parallélisme potentiel de cette application sans adapter le programme à la plate-forme de calcul visée. Le graphe décrit ces tâches ainsi que leurs dépendances, certaines tâches ne pouvant être exécutées avant que d'autres ne soient terminées. L'exécution d'une application est alors déterminée par un ordonnancement du graphe, calculé par un logiciel dédié, qui décrit entre autres quelles ressources sont allouées à chaque tâche et à quel moment. Les trois thèmes étudiés sont les suivants: exploiter le parallélisme intrinsèque des tâches, utiliser des accélérateurs tels que des GPU, et prendre en compte une mémoire limitée.Certaines applications présentent deux types de parallélisme que l'on peut exploiter: plusieurs tâches peuvent être exécutées simultanément, et chaque tâche peut être exécutée sur plusieurs processeurs afin de réduire son temps de calcul. Nous proposons et étudions deux modèles permettant de régir ce temps de calcul, afin d'exploiter ces deux types de parallélisme.Nous étudions ensuite comment utiliser efficacement des accélérateurs de calcul tels que des GPU, dans un contexte dynamique où les futures tâches à ordonnancer ne sont pas connues. La difficulté principale consiste à décider si une tâche doit être exécutée sur l'un des rares accélérateurs disponibles ou sur l'un des nombreux processeurs classiques. La dernière thématique abordée concerne le problème d'une mémoire principale limitée, et le recours à des transferts de données coûteux. Nous avons traité ce problème via deux scénarios. S'il est possible d'éviter de tels transferts, nous avons proposé de modifier le graphe afin de garantir que toute exécution ne dépasse pas la mémoire disponible, ce qui permet d'ordonnancemer les tâches dynamiquement au moment de l'exécution. Si tout ordonnancement nécessite des transferts, nous avons étudié le problème consistant à minimiser leur quantité.L'étude de ces trois thèmes a permis de mieux comprendre la complexité de ces problèmes. Les solutions proposées dans le cadre d'étude théorique pourront influencer de futures implémentations logicielles. / This thesis deals with three main themes linked to task graph scheduling on modern computing platforms. A graph of tasks is a classical model of a program to be executed, for instance a scientific application. The decomposition of an application into several tasks allows to exploit the potential parallelism of this application without adaptating the program to the computing platform. The graph describes the tasks as well as their dependences, some tasks cannot be initiated before others are completed. The execution of an application is then determined by a schedule of the graph, computed by a dedicated software, which in particular describes which resources should be allocated to each task at which time. The three studied themes are the following: exploit inner task parallelism, use accelerators such as GPUs, and cope with a limited memory.For some applications, two types of parallelism can be exploited: several tasks can be executed concurrently, and each task may be executed on several processors, which reduces its processing time. We propose and study two models allowing to describe this processing time acceleration, in order to efficiently exploit both types of parallelism.We then study how to efficiently use accelerators such as GPUs, in a dynamic context in which the future tasks to schedule are unknown. The main difficulty consists in deciding whether a task should be executed on one of the rare available accelerators or on one of the many classical processors. The last theme covered in this thesis deals with a available main memory of limited size, and the resort to expensive data transfers. We focused on two scenarios. If it is possible to avoid such transfers, we propose to modify the graph in order to guarantee that any execution fits in memory, which allows to dynamically schedule the graph at runtime. If every schedule needs transfers, we studied how to minimize their quantity.The work on these three themes has led to a better understanding of the underlying complexities. The proposed theoretical solutions will influence future software implementations.
|
6 |
Parallélisation automatique et statique de tâches sous contraintes de ressources : une approche générique / Automatic Resource-Constrained Static Task Parallelization : A Generic ApproachKhaldi, Dounia 27 November 2013 (has links)
Le but de cette thèse est d'exploiter efficacement le parallélisme présent dans les applications informatiques séquentielles afin de bénéficier des performances fournies par les multiprocesseurs, en utilisant une nouvelle méthodologie pour la parallélisation automatique des tâches au sein des compilateurs. Les caractéristiques clés de notre approche sont la prise en compte des contraintes de ressources et le caractère statique de l'ordonnancement des tâches. Notre méthodologie contient les techniques nécessaires pour la décomposition des applications en tâches et la génération de code parallèle équivalent, en utilisant une approche générique qui vise différents langages et architectures parallèles. Nous implémentons cette méthodologie dans le compilateur source-à-source PIPS. Cette thèse répond principalement à trois questions. Primo, comme l'extraction du parallélisme de tâches des codes séquentiels est un problème d'ordonnancement, nous concevons et implémentons un algorithme d'ordonnancement efficace, que nous nommons BDSC, pour la détection du parallélisme ; le résultat est un SDG ordonnancé, qui est une nouvelle structure de données de graphe de tâches. Secondo, nous proposons une nouvelle extension générique des représentations intermédiaires séquentielles en des représentations intermédiaires parallèles que nous nommons SPIRE, pour la représentation des codes parallèles. Enfin, nous développons, en utilisant BDSC et SPIRE, un générateur de code que nous intégrons dans PIPS. Ce générateur de code cible les systèmes à mémoire partagée et à mémoire distribuée via des codes OpenMP et MPI générés automatiquement. / This thesis intends to show how to efficiently exploit the parallelism present in applications in order to enjoy the performance benefits that multiprocessors can provide, using a new automatic task parallelization methodology for compilers. The key characteristics we focus on are resource constraints and static scheduling. This methodology includes the techniques required to decompose applications into tasks and generate equivalent parallel code, using a generic approach that targets both different parallel languages and architectures. We apply this methodology in the existing tool PIPS, a comprehensive source-to-source compilation platform. This thesis mainly focuses on three issues. First, since extracting task parallelism from sequential codes is a scheduling problem, we design and implement an efficient, automatic scheduling algorithm called BDSC for parallelism detection; the result is a scheduled SDG, a new task graph data structure. In a second step, we design a new generic parallel intermediate representation extension called SPIRE, in which parallelized code may be expressed. Finally, we wrap up our goal of automatic parallelization in a new BDSC- and SPIRE-based parallel code generator, which is integrated within the PIPS compiler framework. It targets both shared and distributed memory systems using automatically generated OpenMP and MPI code.
|
7 |
An Internal Representation for Adaptive Online ParallelizationRehme, Koy D. 29 May 2009 (has links) (PDF)
Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. Adaptive Online Parallelization (ADOPAR) is a unifying framework that attempts to overcome diver sity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved. This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner. The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers.
|
8 |
Distributed OpenCL : a platform for distributed, heterogeneous computing for domain scientistsDillon, William H. (William Hall) 29 May 2012 (has links)
It is possible to purchase, for as little as $10,000, a cluster of computers with the capability to rival the supercomputers of only a few years ago. Now, users that have little to no experience developing distributed applications or managing a cluster are in a position to do so. To allow domain scientists to effectively utilize these resources, Distributed OpenCL (DOCL) was developed. DOCL is an easy-to-use foundation for peer-to-peer distributed computation on small to medium clusters. It is assumed that the end-user is a domain scientist, familiar with model development in environments such as Matlab, though inexperienced with distributed computation or parallel programming. The scope of this work includes the definition of a peer-to-peer protocol for discovering and establishing relationships with every node within a multicast domain, using the concepts of Zero-Configuration Networking, multicast DNS, and DNS Service Discovery. A problematic edge case of multicast DNS is detailed along with a mitigation technique. An XML schema is also described for basic peer communication and cluster management and inventory. A system for scheduling algorithm tasks on the cluster of heterogeneous compute devices was developed, including an automatic computation and communication cost measurement system. Finally, a graphical programming language was designed and implemented that allows non-expert programmers and modelers to develop new applications in a straightforward, accessible way. / Graduation date: 2012
|
9 |
Δυναμική μετάφραση για τη γλώσσα προγραμματισμού JavaΠρούντζος, Δημήτριος 27 February 2009 (has links)
Η γλώσσα Java έχει πλέον εδραιωθεί σαν μια από τις πιο συχνά χρησιμοποιούμενες γλώσσες όχι
μόνο λόγω της εξαιρετικής υποστήριξης σύγχρονων παραδειγμάτων προγραμματισμού, όπως ο
αντικειμενοστραφής και ο γενικευμένος προγραμματισμός, αλλά κυρίως λόγω της εύκολης
μεταφερσιμότητας του κώδικα και της ανεξαρτησίας που παρέχει στα προγράμματά της από
κάποια συγκεκριμένη πλατφόρμα υλικού-λειτουργικού συστήματος. Η δυνατότητα αυτή
συνοψίζεται στο σύνθημα “Write once, run anywhere” που καθιέρωσε η Sun, η εταιρία η οποία
σχεδίασε αρχικά την γλώσσα. Κάτι τέτοιο, επιτυγχάνεται με την μετάφραση ενός προγράμματος
από πηγαίο κώδικα Java σε μια ενδιάμεση αναπαράσταση object κώδικα (bytecode), η οποία στη
συνέχεια εκτελείται στα πλαίσια μιας εικονικής μηχανής. Η πατροπαράδοτη μέθοδος εκτέλεσης
των προγραμμάτων από την εικονική μηχανή ακολουθεί το μοντέλο της διερμήνευσης
(interpretation), το οποίο στην πράξη δεν είναι καθόλου αποδοτικό, σε ότι αφορά το χρόνο
εκτέλεσης. Μια διαφορετική προσέγγιση στην εκτέλεση Java bytecode είναι αυτή της δυναμικής
μετάφρασης (Just In Time compilation – JIT compilation). Εδώ, την πρώτη φορά που εμφανίζεται η
ανάγκη να εκτελεστεί ένα συγκεκριμένο τμήμα κώδικα, η εικονική μηχανή το επεξεργάζεται,
εφαρμόζοντας προαιρετικά μετασχηματισμούς βελτιστοποίησης και παράγει τον αντίστοιχο
κώδικα για το συγκεκριμένο σύστημα-οικοδεσπότη στο οποίο εκτελείται και η ίδια. Ο κώδικας
αυτός στη συνέχεια μπορεί να επαναχρησιμοποιηθεί, εξαλείφοντας το κόστος της επαναληπτικής
μετάφρασης του ίδιου τμήματος bytecode και μειώνοντας το συνολικό χρόνο εκτέλεσης.
Στο πλαίσιο της συγκεκριμένης μεταπτυχιακής εργασίας κατασκευάζουμε ένα JIT μεταφραστή για
μια εικονική μηχανή ειδικού σκοπού, το DSJOS (Distributed Scalable Java Operating System).
Όπως φανερώνει και το όνομα του, το DSJOS είναι ουσιαστικά ένα κατανεμημένο σύστημα που
προσφέρει στα προγράμματα που εκτελούνται εντός αυτού την αφαίρεση μιας Java εικονικής
μηχανής. Ο JIT που δημιουργούμε χρησιμοποιεί ως εσωτερική αναπαράσταση το Ιεραρχικό
Γράφημα Εργασιών (Hierarchical Task Graph – HTG) και στηρίζεται στη βιβλιοθήκη
μετασχηματισμών και βελτιστοποιήσεων (compilation framework) PROMIS. Η υλοποίηση μας
διαρθρώνεται σε τρία κυρίως στάδια: το frontend το οποίο είναι υπεύθυνο για την μετατροπή Java
bytecode στην ενδιάμεση αναπαράσταση, το backend που μετατρέπει την ενδιάμεση
αναπαράσταση σε κώδικα μηχανής για συστήματα x86 και, τέλος, το επίπεδο χρόνου εκτέλεσης
που παρέχει στα εκτελούμενα προγράμματα διάφορες υπηρεσίες απαραίτητες για την εκτέλεση του (π.χ. διαχείριση εξαιρέσεων). Παράλληλα με το σχεδιασμό του βασικού μεταφραστή και την
ενσωμάτωση του στο DSJOS, σχεδιάζουμε και υλοποιούμε και ένα σύνολο μετασχηματισμών, τόσο στο frontend όσο και στο backend, οι οποίοι έχουν ως σκοπό να βελτιώσουν την ποιότητα του παραγόμενου κώδικα και να μειώσουν το χρόνο εκτέλεσης των προγραμμάτων. / -
|
10 |
Contech: a shared memory parallel program analysis frameworkVassenkov, Phillip 13 January 2014 (has links)
We are in the era of multicore machines, where we must exploit thread level parallelism for programs to run better, smarter, faster, and more efficiently. In order to increase instruction level parallelism, processors and compilers perform heavy dataflow analyses between instructions. However, there isn’t much work done in the area of inter-thread dataflow analysis. In order to pave the way and find new ways to conserve resources across a variety of domains (i.e., execution speed, chip die area, power efficiency, and computational throughput), we propose a novel framework, termed Contech, to facilitate the analysis of multithreaded program in terms of its communication and execution patterns. We focus the scope on shared memory programs rather than message passing programs, since it is more difficult to analyze the communication and execution patterns for these programs. Discovering patterns of shared memory programs has the potential to allow general purpose computing machines to turn on or off architectural tricks according to application-specific features. Our design of Contech is modular in nature, so we can glean a large variety of information from an architecturally independent representation of the program under examination.
|
Page generated in 0.0664 seconds