Global ETD Search

Return to search

A Framework for Efficient Management of Fault Tolerance in Cloud Data Centres and High-Performance Computing Systems: An Investigation and Performance analysis of a Cloud Based Virtual Machine Success and Failure Rate in a typical Cloud Computing Environment and Prediction Methods

Cloud computing is increasingly attracting huge attention both in academic research and industry initiatives and has been widely used to solve advanced computation problem. As cloud datacentres continue to grow in scale and complexity, the risk of failure of Virtual Machines (VM) and hosts running several jobs and processing large amount of user request increases and consequently becomes even more difficult to predict potential failures within a datacentre. However, even though fault tolerance continues to be an issue of growing concern in cloud and HPC systems, mitigating the impact of failure and providing accurate predictions with enough lead time remains a difficult research problem. Traditional existing fault-tolerance strategies such as regular check-point/restart and replication are not adequate due to emerging complexities in the systems and do not scale well in the cloud due to resource sharing and distributed systems networks.
In the thesis, a new reliable Fault Tolerance scheme using an intelligent optimal strategy is presented to ensure high system availability, reduced task completion time and efficient VM allocation process.
Specifically, (i) A generic fault tolerance algorithm for cloud data centres and HPC systems in the cloud was developed. (ii) A verification process is developed to a fully dimensional VM specification during allocation in the presence of fault. In comparison to existing approaches, the results obtained shows an increase in success rate of the VMs, a reduction in response time of VM allocation and an improved overall performance. (iii) A failure prediction model is further developed, and the predictive capabilities of machine learning is explored by applying several algorithms to improve the accuracy of prediction. Experimental results indicate that the average prediction accuracy of the proposed model when predicting failure is about 90% accurate compared to existing algorithms, which implies that the approach can effectively predict potential system and application failures within the system.

http://hdl.handle.net/10454/17400

High-performance computing

Identifer	oai:union.ndltd.org:BRADFORD/oai:bradscholars.brad.ac.uk:10454/17400
Date	January 2019
Creators	Mohammed, Bashir
Contributors	Awan, Irfan U., Ugail, Hassan, Kiran, Mariam
Publisher	University of Bradford, University of Bradford, Faculty of Engineering and Informatics
Source Sets	Bradford Scholars
Language	English
Detected Language	English
Type	Thesis, doctoral, PhD
Rights	<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>.

Page generated in 0.0024 seconds

A Framework for Efficient Management of Fault Tolerance in Cloud Data Centres and High-Performance Computing Systems: An Investigation and Performance analysis of a Cloud Based Virtual Machine Success and Failure Rate in a typical Cloud Computing Environment and Prediction Methods

Description

Links & Downloads

Tags

Additional Fields