Return to search

Risk assessment models for resource failure in grid computing

Service Level Agreements (SLAs) are introduced in order to overcome the limitations associated with the best-effort approach in Grid computing, and to accordingly make Grid computing more attractive for commercial uses. However, commercial Grid providers are not keen to adopt SLAs since there is a risk of SLA violation as a result of resource failure, which will result in a penalty fee; therefore, the need to model the resources risk of failure is critical to Grid resource providers. Essentially, moving from the best-effort approach for accepting SLAs to a risk-aware approach assists the Grid resource provider to provide a high-level Quality of Service (QoS). Moreover, risk is an important factor in establishing the resource price and penalty fee in the case of resource failure. In light of this, we propose a mathematical model to predict the risk of failure of a Grid resource using a discrete-time analytical model driven by reliability functions fitted to observed data. The model relies on the resource historical information so as to predict the probability of the resource failure (risk of failure) for a given time interval. The model was evaluated by comparing the predicted risk of failure with the observed risk of failure using availability data gathered from Grids resources. The risk of failure is an important property of a Grid resource, especially when scheduling jobs optimally in relation to resources so as to achieve a business objective. However, in Grid computing, user-centric scheduling algorithms ignore the risk factor and mostly address the minimisation of the cost of the resource allocation, or the overall deadline by which the job must be executed completely. Therefore, we propose a novel user-centric scheduling algorithm for scheduling Bag of Tasks (BoT) applications. The algorithm, which aims to meet user requirements, takes into account the risk of failure, the cost of resources and the job deadline. With this in mind, through simulation, we demonstrate that the algorithm provides a near-optimal solution for minimizing the cost of executing BoT jobs. Also, we show that the execution time of the proposed algorithm is very low, and is therefore suitable for solving scheduling problems in real-time. Risk assessment benefits the resource provider by providing methods to either support accepting or rejecting an SLA. Moreover, it will enable the resource provider to understand the capacity of the infrastructure and to thereby plan future investment. Scheduling algorithms will benefit the resource provider by providing methods to meet user requirements and the better utilisation of resources. The ability to adopt a risk assessment method and user-centric algorithms makes the exploitation of Grid systems more realistic.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:541398
Date January 2011
CreatorsAlsoghayer, Raid Abdullah
ContributorsDjemam, K.
PublisherUniversity of Leeds
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://etheses.whiterose.ac.uk/1909/

Page generated in 0.0083 seconds