Global ETD Search

1	Linear Programming based Resource Management for Heterogeneous Computing Systems Al-Azzoni, Issam 05 1900 (has links) <p> An emerging trend in computing is to use distributed heterogeneous computing (HC) systems to execute a set of tasks. Cluster computer systems, grids, and Desktop Grids are three popular kinds of HC systems. An important component of an HC system is its resource management system (RMS). The main responsibility of an RMS is assigning resources to tasks in order to satisfy certain performance requirements. </p> <p> For cluster computer systems, we propose a new mapping heuristic which requires less state information than current heuristics. For Desktop Grids, we propose a new scheduling policy that exploits knowledge of the effective computing power delivered by the machines and the distribution of their fault times in order to improve performance. Finally, for grids, we propose a new decentralized load balancing policy which dramatically cuts down the communication overhead incurred in state information update. </p> <p> The proposed resource management policies utilize the solution to a linear programming problem (LP) which maximizes the system capacity. Our simulation experiments show that these policies perform very competitively, especially in highly heterogeneous systems. </p> / Thesis / Doctor of Philosophy (PhD) Linear Programming Resource Management Heterogeneous Computing Systems heterogeneous computing
2	Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems using a Simulation-based Methodology Sukhija, Nitin 09 May 2015 (has links) Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability. framework discrete event simulation performance modeling fault-tolerance reliability makespan heterogeneous computing systems scientific applications resilience

Search results

Linear Programming based Resource Management for Heterogeneous Computing Systems

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems using a Simulation-based Methodology