Parallel loops are one of the main sources of parallelism in scientific applications,
and many parallel loops do not have a uniform iteration execution time. To
achieve good performance for such applications on a parallel computer, iterations
of a parallel loop have to be assigned to processors in such a way that each processor
has roughly the same amount of work in terms of execution time. A parallel
computer with a large number of processors tends to have distributed-memory. To
run a parallel loop on a distributed-memory machine, data distribution also needs
to be considered. This research investigates the scheduling of non-uniform parallel
loops on both shared-memory and distributed-memory parallel computers.
We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines
the advantages of both static and dynamic scheduling schemes. SSS has two
phases: a static scheduling phase and a dynamic self-scheduling phase that together
reduce the scheduling overhead while achieving a well balanced workload. The techniques
introduced in SSS can be used by other self-scheduling schemes. The static
scheduling phase further improves the performance by maintaining a high cache hit
ratio resulting from increased affinity of iterations to processors. SSS is also very
well suited for distributed-memory machines.
We introduce methods to duplicate data on a number of processors. The
methods eliminate data movement during computation and increase the scalability
of problem size. We discuss a systematic approach to implement a given self-scheduling
scheme on a distributed-memory. We also show a multilevel scheduling
scheme to self-schedule parallel loops on a distributed-memory machine with a large
number of processors to eliminate the bottleneck resulting from a central scheduler.
We proposed a method using abstractions to automate both self-scheduling
methods and data distribution methods in parallel programming environments. The
abstractions are tested using CHARM, a real parallel programming environment.
Methods are also developed to tolerate processor faults caused by both physical
failure and reassignment of processors by the operating system during the execution
of a parallel loop.
We tested the techniques discussed using simulations and real applications.
Good results have been obtained on both shared-memory and distributed-memory
parallel computers. / Graduation date: 1994
Identifer | oai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/35612 |
Date | 22 September 1993 |
Creators | Liu, Jie |
Contributors | Saletore, Vikram A. |
Source Sets | Oregon State University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0016 seconds