Return to search

Reducing deadline miss rate for grid workloads running in virtual machines : a deadline-aware and adaptive approach

This thesis explores three major areas of research; integration of virutalization into scientific grid infrastructures, evaluation of the virtualization overhead on HPC grid job’s performance, and optimization of job execution times to increase their throughput by reducing job deadline miss rate. Integration of the virtualization into the grid to deploy on-demand virtual machines for jobs in a way that is transparent to the end users and have minimum impact on the existing system poses a significant challenge. This involves the creation of virtual machines, decompression of the operating system image, adapting the virtual environment to satisfy software requirements of the job, constant update of the job state once it’s running with out modifying batch system or existing grid middleware, and finally bringing the host machine back to a consistent state. To facilitate this research, an existing and in production pilot job framework has been modified to deploy virtual machines on demand on the grid using virtualization administrative domain to handle all I/O to increase network throughput. This approach limits the change impact on the existing grid infrastructure while leveraging the execution and performance isolation capabilities of virtualization for job execution. This work led to evaluation of various scheduling strategies used by the Xen hypervisor to measure the sensitivity of job performance to the amount of CPU and memory allocated under various configurations. However, virtualization overhead is also a critical factor in determining job execution times. Grid jobs have a diverse set of requirements for machine resources such as CPU, Memory, Network and have inter-dependencies on other jobs in meeting their deadlines since the input of one job can be the output from the previous job. A novel resource provisioning model was devised to decrease the impact of virtualization overhead on job execution. Finally, dynamic deadline-aware optimization algorithms were introduced using exponential smoothing and rate limiting to predict job failure rates based on static and dynamic virtualization overhead. Statistical techniques were also integrated into the optimization algorithm to flag jobs that are at risk to miss their deadlines, and taking preventive action to increase overall job throughput.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:548848
Date January 2011
CreatorsKhalid, Omer
ContributorsPetridis, Miltiadis ; Anthony, Richard
PublisherUniversity of Greenwich
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://gala.gre.ac.uk/8010/

Page generated in 0.0023 seconds