Spelling suggestions: "subject:"prefix suas"" "subject:"prefix sum""
1 |
Generic implementations of parallel prefix sums and its applicationsHuang, Tao 15 May 2009 (has links)
Parallel prefix sums algorithms are one of the simplest and most useful building
blocks for constructing parallel algorithms. A generic implementation is valuable
because of the wide range of applications for this method.
This thesis presents a generic C++ implementation of parallel prefix sums. The
implementation applies two separate parallel prefix sums algorithms: a recursive
doubling (RD) algorithm and a binary-tree based (BT) algorithm.
This implementation shows how common communication patterns can be separated
from the concrete parallel prefix sums algorithms and thus simplify the work
of parallel programming. For each algorithm, the implementation uses two different
synchronization options: barrier synchronization and point-to-point synchronization.
These synchronization options lead to different communication patterns in the algorithms,
which are represented by dependency graphs between tasks.
The performance results show that point-to-point synchronization performs better
than barrier synchronization as the number of processors increases.
As part of the applications for parallel prefix sums, parallel radix sort and four
parallel tree applications are built on top of the implementation. These applications
are also fundamental parallel algorithms and they represent typical usage of parallel
prefix sums in numeric computation and graph applications. The building of such
applications become straighforward given this generic implementation of parallel
prefix sums.
|
2 |
Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler / Idiomdriven vektorisering av inre loopar med databeroenden i HotSpots C2 kompilatorSjöblom, William January 2020 (has links)
This thesis presents a technique for automatic vectorization of innermost single statement loops with a cross-iteration data dependence by analyzing data-flow to recognize frequently recurring program idioms. Recognition is carried out by matching the circular SSA data-flow found around the loop body’s φ-function against several primitive patterns, forming a tree representation of the relevant data-flow that is then pruned down to a single parameterized node, providing a high-level specification of the data-flow idiom at hand used to guide algorithmic replacement applied to the intermediate representation. The versatility of the technique is shown by presenting an implementation supporting vectorization of both a limited class of linear recurrences as well as prefix sums, where the latter shows how the technique generalizes to intermediate representations with memory state in SSA-form. Finally, a thorough performance evaluation is presented, showing the effectiveness of the vectorization technique.
|
Page generated in 0.0441 seconds