Global ETD Search

1	Performance Analysis and Evaluation of Dynamic Loop Scheduling Techniques in a Competitive Runtime Environment for Distributed Memory Architectures Balasubramaniam, Mahadevan 10 May 2003 (has links) Parallel computing offers immense potential to solve large, complex scientific problems. Load imbalance is a major impediment in obtaining high performance by a parallel system. One principal form of parallelism found in scientific applications is data parallelism. Loops without dependencies are data parallel. During the execution of large parallel loops, computational requirements vary due to problem, algorithmic and systemic characteristics. These factors lead to load imbalance which in turn degrades the performance of an application. Over the years, a number of dynamic loop scheduling techniques have been proposed to address one or more of these factors. However, there is no single strategy that works well for different problem domains and system characteristics. Moreover, load balancing during runtime is complicated because of its need for dynamic data redistribution. Therefore, there is a distinct need to integrate the dynamic loop scheduling techniques into a single package and provide them as an application programming interface (API) to the application developer. In recent years, along this direction, a number of dynamic loop scheduling techniques have been integrated into the compiler technologies for shared memory environments. On the other hand, there is no such integrated approach for distributed memory applications. The purpose of this thesis is to present the design, implementation and effectiveness of an integrated approach:the dynamic loop scheduling techniques are integrated into a runtime system for distributed memory architectures. For this purpose, we choose the newly developed parallel runtime environment for multicomputer architecture (PREMA) with its main components: the data movement and control substrate (DMCS) and mobile object layer (MOL). This runtime system has recently been developed and has demonstrated to be one of the most competitive runtime systems for distributed memory architectures. The significance of this work is that the proposed API will enhance the performance of parallel applications by reducing the load imbalance among processors caused by a wide range of factors and will reduce the software developmental cost required for load balancing. With the integration of the scheduling capabilities into the runtime system, its applicability has been expanded. The performance of the API has been evaluated qualitatively and quantitatively. The overhead of the API has been studied analytically and measured experimentally. Three parallel benchmarks including scientific applications of general interest (N-body simulations, automatic quadrature routine and unstructured grid heat solver) were considered for experimentation purpose. Based on the experiments conducted, a cost improvement of up to 76% over the straight forward parallel benchmark has been obtained. For certain application characteristics, the overhead of the runtime system was found to be within 10% of the underlying messaging layer. These results demonstrate that, in large scientific applications it is possible and desirable to combine the rich functionality of a runtime system with the advantages of scheduling techniques to achieve high performance. dynamic loop scheduling one sided communication
2	Performance Analysis and Evaluation of Divisible Load Theory and Dynamic Loop Scheduling Algorithms in Parallel and Distributed Environments Balasubramaniam, Mahadevan 14 August 2015 (has links) High performance parallel and distributed computing systems are used to solve large, complex, and data parallel scientific applications that require enormous computational power. Data parallel workloads which require performing similar operations on different data objects, are present in a large number of scientific applications, such as N-body simulations and Monte Carlo simulations, and are expressed in the form of loops. Data parallel workloads that lack precedence constraints are called arbitrarily divisible workloads, and are amenable to easy parallelization. Load imbalance that arise from various sources such as application, algorithmic, and systemic characteristics during the execution of scientific applications degrades performance. Scheduling of arbitrarily divisible workloads to address load imbalance in order to obtain better utilization of computing resources is a major area of research. Divisible load theory (DLT) and dynamic loop scheduling (DLS) algorithms are two algorithmic approaches employed in the scheduling of arbitrarily divisible workloads. Despite sharing the same goal of achieving load balancing, the two approaches are fundamentally different. Divisible load theory algorithms are linear, deterministic and platform dependent, whereas dynamic loop scheduling algorithms are probabilistic and platform agnostic. Divisible load theory algorithms have been traditionally used for performance prediction in environments characterized by known or expected variation in the system characteristics at runtime. Dynamic loop scheduling algorithms are designed to simultaneously address all the sources of load imbalance that stochastically arise at runtime from application, algorithmic, and systemic characteristics. In this dissertation, an analysis and performance evaluation of DLT and DLS algorithms are presented in the form of a scalability study and a robustness investigation. The effect of network topology on their performance is studied. A hybrid scheduling approach is also proposed that integrates DLT and DLS algorithms. The hybrid approach combines the strength of DLT and DLS algorithms and improves the performance of the scientific applications running in large scale parallel and distributed computing environments, and delivers performance superior to that which can be obtained by applying DLT algorithms in isolation. The range of conditions for which the hybrid approach is useful is also identified and discussed. 3D torus parallel and distributed computing robustness scalability dynamic loop scheduling divisible load theory arbitrarily divisible workloads data parallel workloads cluster

Search results

Performance Analysis and Evaluation of Dynamic Loop Scheduling Techniques in a Competitive Runtime Environment for Distributed Memory Architectures

Performance Analysis and Evaluation of Divisible Load Theory and Dynamic Loop Scheduling Algorithms in Parallel and Distributed Environments