Return to search

Allocation of SISAL program graphs to processors using BLAS

There are a number of well known techniques for extracting parallelism from a
given program. They range from hardware implementations, building restructuring
compilers or reorganizing of programs so as to specify all the available parallelism. The
success rate of any of the known techniques is rather poor over all types of programs.
This has pushed the research community to explore new languages and design different
architectures to exploit program parallelism.
The principles of dataflow architectures have addressed the problem of exploiting
parallelism in systems by executing dataflow graphs. These graphs or programs represent
data dependencies among instructions and execution of the graph proceeds in a data-driven
manner. That is, an instruction is executed as soon as all its operands are
available, without waiting for any program counter to sequence its execution, as is the
case in conventional von Neumann architectures.
In this thesis, data flow graphs are generated during the intermediate compilation of
a functional language called SISAL (Streams and Iterations in a Single Assignment
Language). The Intermediate Form (IFl) is a graphical language consisting of multiple
acyclic function graphs that represent a given program. Each graph consists of a
sequence of nodes and edges. The nodes specify the operation and the edges indicate the
dependencies between the nodes. The graphs are further connected to each other by
means of implicit dependencies.
The Automator package developed in this project, preprocesses these multiple IF1
graphs and translates them into a single connected graph. It converts all implicit
dependencies into actual ones. Additionally, complex language constructs like For All,
loops and if-then-else are treated in special ways together with their nested levels by the
Automator. There is virtually no limit to the number of nested levels that can be
translated by this package.
The Automator's prime contribution is in translating real programs written in SISAL
into a specified format required by an allocation algorithm called the Balanced Layered
Allocation Scheme (BLAS).
BLAS partitions a connected graph into independent tasks and assigns them to
processors in a multicomputer system. The problem of program allocation lies in
maximizing parallelism while minimizing interprocessor communication costs. Hence,
allocation is based on the best choice of communication to execution ratio for each task.
BLAS utilizes heuristic rules to find a balance between computation and communication
costs in the target system. Here the target architecture is a simulated nCUBE 3E
computer, having a hypercube topology.
Simulations show that, BLAS is effective in reducing the overall execution time of
a program by considering the communication costs on the execution times. The results
will help in understanding the effects in packing nodes (grain-packing), routing issues in
the network and in general, the allocation problem to any processor in a network. In
addition, tasks have also been assigned to adjacent processors only, instead of any
processor on the hypercube network. The adjacent allocation to processors helps to
determine trade-offs required between achieved speed-ups and the time it takes to
completely allocate large graphs on compilation. / Graduation date: 1994

Identiferoai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/35764
Date07 April 1994
CreatorsRaisinghani, Manoj H.
ContributorsLee, Ben
Source SetsOregon State University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0017 seconds