• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 2
  • Tagged with
  • 10
  • 10
  • 5
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Explorations in grid workflow scheduling

Zheng, Wei January 2010 (has links)
Aiming at aggregating numerous distributed resources to provide immense computing power, Grid computing has emerged as a promising paradigm to run complex composite applications such as workflows. However, the inherent uncertainties of grid systems as well as the structural complexity of workflow applications make it extremely challenging to schedule workflows in an efficient way, regardless of whether the objective is to minimize execution time or meet specific user and/or system Quality of Service (QoS) requirements. For both these cases, this thesis considers scheduling problems motivated by grid uncertainties and advances the state-of-the-art by developing new techniques to address these problems.First, based on existing scheduling heuristics, a Monte-Carlo approach is developed to minimize the average makespan (i.e., the overall execution time) in the presence of task estimates exhibiting limited uncertainty in the form of (controlled) random behaviour. Next, a scenario where performance prediction is difficult to obtain and resource availability may vary over time, is considered. A low-cost efficient just-in-time heuristic is proposed to cope with grid uncertainties.After addressing these performance-driven scheduling problems, a QoS-driven problem, which considers not only the aforementioned uncertainties but also the uncertainty caused by queue-based scheduling, is examined. In order to tackle all these uncertainties, an integrated scheduling model consisting of three supportive techniques is developed. Extensive evaluation using simulation shows that the proposed techniques can achieve substantial improvements towards the ultimate goal of providing a good solution for QoS-driven workflow scheduling on the Grid.
2

Estratégias de escalonamento de workflows com tarefas paralelas e sequenciais em grades computacionais. / Strategies for scheduling workflows composed of sequential and parallel tasks on grid environments.

Stanzani, Silvio Luiz 18 October 2013 (has links)
A demanda por alto desempenho é um desafio enfrentado por diversas aplicações científicas. Nesse sentido, ambientes para processamento distribuído, tais como, clusters e grades computacionais, têm sido desenvolvidos para prover suporte ao uso de diversos recursos simultaneamente para uma mesma aplicação. Aplicações computacionalmente intensivas são organizadas em workflows e executadas com suporte de middlewares para abstrair a complexidade de uso de tais ambientes. Em ambientes de grade computacional, a execução de workflows contendo tarefas sequenciais e tarefas com paralelismo interno, obtendo bom desempenho, é um desafio, devido à heterogeneidade e comportamento dinâmico do ambiente. Nesse sentido, o escalonamento de workflows em ambientes de grade computacional é essencial. O problema de escalonamento de tarefas, em sua forma geral, é NPCompleto, dessa forma, o estudo do escalonamento de workflows em ambientes de grade computacional é fundamental para aprimorar a execução de aplicações computacionalmente intensivas. O objetivo dessa tese é propor estratégias de escalonamento de workflows, que exploram os seguintes aspectos: Avaliação da possibilidade de executar cada tarefa com paralelismo interno usando recursos de múltiplos clusters; Adaptação de planos de escalonamento no momento da submissão de novos workflows. Foram desenvolvidas duas estratégias: a primeira é uma estratégia para escalonamento estático de workflows, que considera o ambiente dedicado a execução de um workflow. A segunda foi desenvolvida para ser utilizada em conjunto com a primeira, para melhorar o tempo de resposta de múltiplos workflows que podem ser submetidos em diferentes momentos. As estratégias propostas foram avaliadas em um ambiente de simulação. / The demand for high performance is a common problem in many scientific applications. In this sense, distributed processing environments such as cluster, grid computing and multi-cluster environments have been developed to provide support for the use of several resources simultaneously for the same application. Computationally intensive applications are structured as workflows and executed with the support of middleware to abstract the complexity of using such environments. In grid computing environments the execution of workflows containing sequential and parallel tasks, with good performance is a challenge due to the heterogeneity and dynamic behavior of the environment. In this sense, the scheduling of workflows on grid computing environments is essential. The task scheduling problem in its general form is NP-Complete, in this sense, the study concerning workflow scheduling in grid computing environments is fundamental to improve the performance of computationally intensive applications. The aim of this thesis is to propose strategies for scheduling workflows that exploit the following aspects: Explore the possibility of performing single parallel tasks using multiple clusters; Adaptation plans escalation in accordance with the submission of new workflows. Two strategies were developed: the first one is a strategy for static scheduling of workflows, which considers a dedicated environment to the execution of a workflow. The second one was developed to use in conjunction with the first one, in order to improve the response time of multiple workflows that can be submitted at different times. The proposed strategies were evaluated in a simulation environment.
3

Estratégias de escalonamento de workflows com tarefas paralelas e sequenciais em grades computacionais. / Strategies for scheduling workflows composed of sequential and parallel tasks on grid environments.

Silvio Luiz Stanzani 18 October 2013 (has links)
A demanda por alto desempenho é um desafio enfrentado por diversas aplicações científicas. Nesse sentido, ambientes para processamento distribuído, tais como, clusters e grades computacionais, têm sido desenvolvidos para prover suporte ao uso de diversos recursos simultaneamente para uma mesma aplicação. Aplicações computacionalmente intensivas são organizadas em workflows e executadas com suporte de middlewares para abstrair a complexidade de uso de tais ambientes. Em ambientes de grade computacional, a execução de workflows contendo tarefas sequenciais e tarefas com paralelismo interno, obtendo bom desempenho, é um desafio, devido à heterogeneidade e comportamento dinâmico do ambiente. Nesse sentido, o escalonamento de workflows em ambientes de grade computacional é essencial. O problema de escalonamento de tarefas, em sua forma geral, é NPCompleto, dessa forma, o estudo do escalonamento de workflows em ambientes de grade computacional é fundamental para aprimorar a execução de aplicações computacionalmente intensivas. O objetivo dessa tese é propor estratégias de escalonamento de workflows, que exploram os seguintes aspectos: Avaliação da possibilidade de executar cada tarefa com paralelismo interno usando recursos de múltiplos clusters; Adaptação de planos de escalonamento no momento da submissão de novos workflows. Foram desenvolvidas duas estratégias: a primeira é uma estratégia para escalonamento estático de workflows, que considera o ambiente dedicado a execução de um workflow. A segunda foi desenvolvida para ser utilizada em conjunto com a primeira, para melhorar o tempo de resposta de múltiplos workflows que podem ser submetidos em diferentes momentos. As estratégias propostas foram avaliadas em um ambiente de simulação. / The demand for high performance is a common problem in many scientific applications. In this sense, distributed processing environments such as cluster, grid computing and multi-cluster environments have been developed to provide support for the use of several resources simultaneously for the same application. Computationally intensive applications are structured as workflows and executed with the support of middleware to abstract the complexity of using such environments. In grid computing environments the execution of workflows containing sequential and parallel tasks, with good performance is a challenge due to the heterogeneity and dynamic behavior of the environment. In this sense, the scheduling of workflows on grid computing environments is essential. The task scheduling problem in its general form is NP-Complete, in this sense, the study concerning workflow scheduling in grid computing environments is fundamental to improve the performance of computationally intensive applications. The aim of this thesis is to propose strategies for scheduling workflows that exploit the following aspects: Explore the possibility of performing single parallel tasks using multiple clusters; Adaptation plans escalation in accordance with the submission of new workflows. Two strategies were developed: the first one is a strategy for static scheduling of workflows, which considers a dedicated environment to the execution of a workflow. The second one was developed to use in conjunction with the first one, in order to improve the response time of multiple workflows that can be submitted at different times. The proposed strategies were evaluated in a simulation environment.
4

Scheduling and deployment of large-scale applications on Cloud platforms

Muresan, Adrian 10 December 2012 (has links) (PDF)
Infrastructure as a service (IaaS) Cloud platforms are increasingly used in the IT industry. IaaS platforms are providers of virtual resources from a catalogue of predefined types. Improvements in virtualization technology make it possible to create and destroy virtual machines on the fly, with a low overhead. As a result, the great benefit of IaaS platforms is the ability to scale a virtual platform on the fly, while only paying for the used resources. From a research point of view, IaaS platforms raise new questions in terms of making efficient virtual platform scaling decisions and then efficiently scheduling applications on dynamic platforms. The current thesis is a step forward towards exploring and answering these questions. The first contribution of the current work is focused on resource management. We have worked on the topic of automatically scaling cloud client applications to meet changing platform usage. There have been various studies showing self-similarities in web platform traffic which implies the existence of usage patterns that may or may not be periodical. We have developed an automatic platform scaling strategy that predicted platform usage by identifying non-periodic usage patterns and extrapolating future platform usage based on them. Next we have focused on extending an existing grid platform with on-demand resources from an IaaS platform. We have developed an extension to the DIET (Distributed Interactive Engineering Toolkit) middleware, that uses a virtual market based approach to perform resource allocation. Each user is given a sum of virtual currency that he will use for running his tasks. This mechanism help in ensuring fair platform sharing between users. The third and final contribution targets application management for IaaS platforms. We have studied and developed an allocation strategy for budget-constrained workflow applications that target IaaS Cloud platforms. The workflow abstraction is very common amongst scientific applications. It is easy to find examples in any field from bioinformatics to geology. In this work we have considered a general model of workflow applications that comprise parallel tasks and permit non-deterministic transitions. We have elaborated two budget-constrained allocation strategies for this type of workflow. The problem is a bi-criteria optimization problem as we are optimizing both budget and workflow makespan. This work has been practically validated by implementing it on top of the Nimbus open source cloud platform and the DIET MADAG workflow engine. This is being tested with a cosmological simulation workflow application called RAMSES. This is a parallel MPI application that, as part of this work, has been ported for execution on dynamic virtual platforms. Both theoretical simulations and practical experiments have shown encouraging results and improvements.
5

Specification And Scheduling Of Workflows Under Resource Allocation Constraints

Senkul Karagoz, Pinar 01 January 2003 (has links) (PDF)
Workflow is a collection of tasks organized to accomplish some business process. It also defines the order of task invocation or conditions under which task must be invoked, task synchronization, and information flow. Before the execution of the workflow, a correct execution schema, in other words, the schedule of the workflow, must be determined. Workflow scheduling is finding an execution sequence of tasks that obeys the business logic of workflow. Research on specification and scheduling of workflows has concentrated on temporal and causality constraints, which specify existence and order dependencies among tasks. However, another set of constraints that specify resource allocation is also equally important. The resources in a workflow environment are agents such as person, machine, software, etc. that execute the task. Execution of a task has a cost and this may vary depending on the resources allocated in order to execute that task. Resource allocation constraints define restrictions on how to allocate resources, and scheduling under resource allocation constraints provide proper resource allocation to tasks. In this thesis, we present two approaches to specify and schedule workflows under resource allocation constraints as well as temporal and causality constraints. In the first approach, we present an architecture whose core and novel parts are a specifi- cation language with the ability to express resources and resource allocation constraints and a scheduler module that contains a constraint solver in order to find correct resource assignments. In the second approach, we developed a new logical formalism, called Concurrent Constraint Transaction Logic (CCTR) which integrates constraint logic programming (CLP) and Concurrent Transaction Logic, and a logic-based work- flow scheduler that is based on this new formalism. CCTR has the constructs to specify resource allocation constraints as well as workflows and it provides semantics for these specifications so that validity of a schedule can be checked.
6

Cost-efficient resource management for scientific workflows on the cloud

Pietri, Ilia January 2016 (has links)
Scientific workflows are used in many scientific fields to abstract complex computations (tasks) and data or flow dependencies between them. High performance computing (HPC) systems have been widely used for the execution of scientific workflows. Cloud computing has gained popularity by offering users on-demand provisioning of resources and providing the ability to choose from a wide range of possible configurations. To do so, resources are made available in the form of virtual machines (VMs), described as a set of resource characteristics, e.g. amount of CPU and memory. The notion of VMs enables the use of different resource combinations which facilitates the deployment of the applications and the management of the resources. A problem that arises is determining the configuration, such as the number and type of resources, that leads to efficient resource provisioning. For example, allocating a large amount of resources may reduce application execution time however at the expense of increased costs. This thesis investigates the challenges that arise on resource provisioning and task scheduling of scientific workflows and explores ways to address them, developing approaches to improve energy efficiency for scientific workflows and meet the user's objectives, e.g. makespan and monetary cost. The motivation stems from the wide range of options that enable to select cost-efficient configurations and improve resource utilisation. The contributions of this thesis are the following. (i) A survey of the issues arising in resource management in cloud computing; The survey focuses on VM management, cost efficiency and the deployment of scientific workflows. (ii) A performance model to estimate the workflow execution time for a different number of resources based on the workflow structure; The model can be used to estimate the respective user and energy costs in order to determine configurations that lead to efficient resource provisioning and achieve a balance between various conflicting goals. (iii) Two energy-aware scheduling algorithms that maximise the number of completed workflows from an ensemble under energy and budget or deadline constraints; The algorithms address the problem of energy-aware resource provisioning and scheduling for scientific workflow ensembles. (iv) An energy-aware algorithm that selects the frequency to be used for each workflow task in order to achieve energy savings without exceeding the workflow deadline; The algorithm takes into account the different requirements and constraints that arise depending on the workflow and system characteristics. (v) Two cost-based frequency selection algorithms that choose the CPU frequency for each provisioned resource in order to achieve cost-efficient resource configurations for the user and complete the workflow within the deadline; Decision making is based on both the workflow characteristics and the pricing model of the provider.
7

Efficient Scientific Workflow Scheduling in Cloud Environment

Cao, Fei 01 May 2014 (has links)
Cloud computing enables the delivery of remote computing, software and storage services through web browsers following pay-as-you-go model. In addition to successful commercial applications, many research efforts including DOE Magellan Cloud project focus on discovering the opportunities and challenges arising from the computing and data-intensive scientific applications that are not well addressed by the current supercomputers, Linux clusters and Grid technologies. The elastic resource provision, noninterfering resource sharing and flexible customized configuration provided by the Cloud infrastructure has shed light on efficient execution of many scientific applications modeled as Directed Acyclic Graph (DAG) structured workflows to enforce the intricate dependency among a large number of different processing tasks. Meanwhile, the Cloud environment poses various challenges. Cloud providers and Cloud users pursue different goals. Providers aim to maximize profit by achieving higher resource utilization and users want to minimize expenses while meeting their performance requirements. Moreover, due to the expanding Cloud services and emerging newer technologies, the ever-increasing heterogeneity of the Cloud environment complicates the challenges for both parties. In this thesis, we address the workflow scheduling problem from different applications and various objectives. For batch applications, due to the increasing deployment of many data centers and computer servers around the globe escalated by the higher electricity price, the energy cost on running the computing, communication and cooling together with the amount of CO2 emissions have skyrocketed. In order to maintain sustainable Cloud computing facing with ever-increasing problem complexity and big data size in the next decades, we design and develop energy-aware scientific workflow scheduling algorithm to minimize energy consumption and CO2 emission while still satisfying certain Quality of Service (QoS) such as response time specified in Service Level Agreement (SLA). Furthermore, the underlying Cloud hardware/Virtual Machine (VM) resource availability is time-dependent because of the dual operation modes namely on-demand and reservation instances at various Cloud data centers. We also apply techniques such as Dynamic Voltage and Frequency Scaling (DVFS) and DNS scheme to further reduce energy consumption within acceptable performance bounds. Our multiple-step resource provision and allocation algorithm achieves the response time requirement in the step of forward task scheduling and minimizes the VM overhead for reduced energy consumption and higher resource utilization rate in the backward task scheduling step. We also evaluate the candidacy of multiple data centers from the energy and performance efficiency perspectives as different data centers have various energy and cost related parameters. For streaming applications, we formulate scheduling problems with two different objectives, namely one is to maximize the throughput under a budget constraint while another is to minimize execution cost under a minimum throughput constraint. Two different algorithms named as Budget constrained RATE (B-RATE) and Budget constrained SWAP (B-SWAP) are designed under the first objective; Another two algorithms, namely Throughput constrained RATE (TP-RATE) and Throughput constrained SWAP (TP-SWAP) are developed under the second objective.
8

Scheduling and deployment of large-scale applications on Cloud platforms / Ordonnancement et déploiement d'applications de gestion de données à grande échelle sur des plates-formes de type Clouds

Muresan, Adrian 10 December 2012 (has links)
L'usage des plateformes de Cloud Computing offrant une Infrastructure en tant que service (IaaS) a augmenté au sein de l'industrie. Les infrastructures IaaS fournissent des ressources virtuelles depuis un catalogue de types prédéfinis. Les avancées dans le domaine de la virtualisation rendent possible la création et la destruction de machines virtuelles au fur et à mesure, avec un faible surcout d'exploitation. En conséquence, le bénéfice offert par les plate-formes IaaS est la possibilité de dimensionner une architecture virtuelle au fur et à mesure de l'utilisation, et de payer uniquement les ressources utilisées. D'un point de vue scientifique, les plateformes IaaS soulèvent de nouvelles questions concernant l'efficacité des décisions prises en terme de passage à l'échelle, et également l'ordonnancement des applications sur les plateformes dynamiques. Les travaux de cette thèse explorent ce thème et proposent des solutions à ces deux problématiques. La première contribution décrite dans cette thèse concerne la gestion des ressources. Nous avons travaillé sur le redimensionnement automatique des applications clientes de Cloud afin de modéliser les variations d'utilisation de la plateforme. De nombreuses études ont montré des autosimilarités dans le trafic web des plateformes, ce qui implique l'existence de motifs répétitifs pouvant être périodiques ou non. Nous avons développé une stratégie automatique de dimensionnement, capable de prédire le temps d'utilisation de la plateforme en identifiant les motifs répétitifs non périodiques. Dans un second temps, nous avons proposé d'étendre les fonctionnalités d'un intergiciel de grilles, en implémentant une utilisation des ressources à la demandes.Nous avons développé une extension pour l'intergiciel DIET (Distributed Interactive Engineering Toolkit), qui utilise un marché virtuel pour gérer l'allocation des ressources. Chaque utilisateur se voit attribué un montant de monnaie virtuelle qu'il utilisera pour exécuter ses tâches. Le mécanisme d'aide assure un partage équitable des ressources de la plateforme entre les différents utilisateurs. La troisième et dernière contribution vise la gestion d'applications pour les plateformes IaaS. Nous avons étudié et développé une stratégie d'allocation des ressources pour les applications de type workflow avec des contraintes budgétaires. L'abstraction des applications de type workflow est très fréquente au sein des applications scientifiques, dans des domaines variés allant de la géologie à la bioinformatique. Dans ces travaux, nous avons considéré un modèle général d'applications de type workflow qui contient des tâches parallèles et permet des transitions non déterministes. Nous avons élaboré deux stratégies d'allocations à contraintes budgétaires pour ce type d'applications. Le problème est une optimisation à deux critères dans la mesure où nous optimisons le budget et le temps total du flux d'opérations. Ces travaux ont été validés de façon expérimentale par leurs implémentations au sein de la plateforme de Cloud libre Nimbus et de moteur de workflow MADAG présent au sein de DIET. Les tests ont été effectuées sur une simulation de cosmologie appelée RAMSES. RAMSES est une application parallèle qui, dans le cadre de ces travaux, a été portée sur des plateformes virtuelles dynamiques. L'ensemble des résultats théoriques et pratiques ont débouché sur des résultats encourageants et des améliorations. / Infrastructure as a service (IaaS) Cloud platforms are increasingly used in the IT industry. IaaS platforms are providers of virtual resources from a catalogue of predefined types. Improvements in virtualization technology make it possible to create and destroy virtual machines on the fly, with a low overhead. As a result, the great benefit of IaaS platforms is the ability to scale a virtual platform on the fly, while only paying for the used resources. From a research point of view, IaaS platforms raise new questions in terms of making efficient virtual platform scaling decisions and then efficiently scheduling applications on dynamic platforms. The current thesis is a step forward towards exploring and answering these questions. The first contribution of the current work is focused on resource management. We have worked on the topic of automatically scaling cloud client applications to meet changing platform usage. There have been various studies showing self-similarities in web platform traffic which implies the existence of usage patterns that may or may not be periodical. We have developed an automatic platform scaling strategy that predicted platform usage by identifying non-periodic usage patterns and extrapolating future platform usage based on them. Next we have focused on extending an existing grid platform with on-demand resources from an IaaS platform. We have developed an extension to the DIET (Distributed Interactive Engineering Toolkit) middleware, that uses a virtual market based approach to perform resource allocation. Each user is given a sum of virtual currency that he will use for running his tasks. This mechanism help in ensuring fair platform sharing between users. The third and final contribution targets application management for IaaS platforms. We have studied and developed an allocation strategy for budget-constrained workflow applications that target IaaS Cloud platforms. The workflow abstraction is very common amongst scientific applications. It is easy to find examples in any field from bioinformatics to geology. In this work we have considered a general model of workflow applications that comprise parallel tasks and permit non-deterministic transitions. We have elaborated two budget-constrained allocation strategies for this type of workflow. The problem is a bi-criteria optimization problem as we are optimizing both budget and workflow makespan. This work has been practically validated by implementing it on top of the Nimbus open source cloud platform and the DIET MADAG workflow engine. This is being tested with a cosmological simulation workflow application called RAMSES. This is a parallel MPI application that, as part of this work, has been ported for execution on dynamic virtual platforms. Both theoretical simulations and practical experiments have shown encouraging results and improvements.
9

Implementation Of Concurrent Constraint Transaction Logic And Its User Interface

Altunyuva, Fethi 01 September 2006 (has links) (PDF)
This thesis implements a logical formalism framework called Concurrent Constraint Transaction Logic (abbr.,CCTR) which was defined for modeling and scheduling of workflows under resource allocation and cost constraints and develops an extensible and flexible graphical user interface for the framework. CCTR extends Concurrent Transaction Logic and integrates with Constraint Logic Programming to find the correct scheduling of tasks that involves resource and cost constraints. The developed system, which integrates Prolog and Java Platforms, is designed to serve as the basic environment for enterprise applications that involves CCTR based workflows and schedulers. Full implementation described in this thesis clearly illustrated that CCTR can be used as a workflow scheduler that involves not only temporal and causal constraints but also resource and cost constraints.
10

Scientific Workflows for Hadoop

Bux, Marc Nicolas 07 August 2018 (has links)
Scientific Workflows bieten flexible Möglichkeiten für die Modellierung und den Austausch komplexer Arbeitsabläufe zur Analyse wissenschaftlicher Daten. In den letzten Jahrzehnten sind verschiedene Systeme entstanden, die den Entwurf, die Ausführung und die Verwaltung solcher Scientific Workflows unterstützen und erleichtern. In mehreren wissenschaftlichen Disziplinen wachsen die Mengen zu verarbeitender Daten inzwischen jedoch schneller als die Rechenleistung und der Speicherplatz verfügbarer Rechner. Parallelisierung und verteilte Ausführung werden häufig angewendet, um mit wachsenden Datenmengen Schritt zu halten. Allerdings sind die durch verteilte Infrastrukturen bereitgestellten Ressourcen häufig heterogen, instabil und unzuverlässig. Um die Skalierbarkeit solcher Infrastrukturen nutzen zu können, müssen daher mehrere Anforderungen erfüllt sein: Scientific Workflows müssen parallelisiert werden. Simulations-Frameworks zur Evaluation von Planungsalgorithmen müssen die Instabilität verteilter Infrastrukturen berücksichtigen. Adaptive Planungsalgorithmen müssen eingesetzt werden, um die Nutzung instabiler Ressourcen zu optimieren. Hadoop oder ähnliche Systeme zur skalierbaren Verwaltung verteilter Ressourcen müssen verwendet werden. Diese Dissertation präsentiert neue Lösungen für diese Anforderungen. Zunächst stellen wir DynamicCloudSim vor, ein Simulations-Framework für Cloud-Infrastrukturen, welches verschiedene Aspekte der Variabilität adäquat modelliert. Im Anschluss beschreiben wir ERA, einen adaptiven Planungsalgorithmus, der die Ausführungszeit eines Scientific Workflows optimiert, indem er Heterogenität ausnutzt, kritische Teile des Workflows repliziert und sich an Veränderungen in der Infrastruktur anpasst. Schließlich präsentieren wir Hi-WAY, eine Ausführungsumgebung die ERA integriert und die hochgradig skalierbare Ausführungen in verschiedenen Sprachen beschriebener Scientific Workflows auf Hadoop ermöglicht. / Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today's data-driven science. Over the last decades, scientific workflow management systems have emerged to facilitate the design, execution, and monitoring of such workflows. At the same time, the amounts of data generated in various areas of science outpaced hardware advancements. Parallelization and distributed execution are generally proposed to deal with increasing amounts of data. However, the resources provided by distributed infrastructures are subject to heterogeneity, dynamic performance changes at runtime, and occasional failures. To leverage the scalability provided by these infrastructures despite the observed aspects of performance variability, workflow management systems have to progress: Parallelization potentials in scientific workflows have to be detected and exploited. Simulation frameworks, which are commonly employed for the evaluation of scheduling mechanisms, have to consider the instability encountered on the infrastructures they emulate. Adaptive scheduling mechanisms have to be employed to optimize resource utilization in the face of instability. State-of-the-art systems for scalable distributed resource management and storage, such as Apache Hadoop, have to be supported. This dissertation presents novel solutions for these aspirations. First, we introduce DynamicCloudSim, a cloud computing simulation framework that is able to adequately model the various aspects of variability encountered in computational clouds. Secondly, we outline ERA, an adaptive scheduling policy that optimizes workflow makespan by exploiting heterogeneity, replicating bottlenecks in workflow execution, and adapting to changes in the underlying infrastructure. Finally, we present Hi-WAY, an execution engine that integrates ERA and enables the highly scalable execution of scientific workflows written in a number of languages on Hadoop.

Page generated in 0.0961 seconds