Spelling suggestions: "subject:"polymar"" "subject:"polymaleic""
1 |
Automatic Optimization of Geometric Multigrid Methods using a DSL ApproachVasista, Vinay V January 2017 (has links) (PDF)
Geometric Multigrid (GMG) methods are widely used in numerical analysis to accelerate the convergence of partial differential equations solvers using a hierarchy of grid discretizations. These solvers find plenty of applications in various fields in engineering and scientific domains, where solving PDEs is of fundamental importance. Using multigrid methods, the pace at which the solvers arrive at the solution can be improved at an algorithmic level. With the advance in modern computer architecture, solving problems with higher complexity and sizes is feasible - this is also the case with multigrid methods. However, since hardware support alone cannot achieve high performance in execution time, there is a need for good software that help programmers in doing so.
Multiple grid sizes and recursive expression of multigrid cycles make the task of manual program optimization tedious and error-prone. A high-level language that aids domain experts to quickly express complex algorithms in a compact way using dedicated constructs for multigrid methods and with good optimization support is thus valuable. Typical computation patterns in a GMG algorithm includes stencils, point-wise accesses, restriction and interpolation of a grid. These computations can be optimized for performance on modern architectures using standard parallelization and locality enhancement techniques.
Several past works have addressed the problem of automatic optimizations of computations in various scientific domains using a domain-specific language (DSL) approach. A DSL is a language with features to express domain-specific computations and compiler support to enable optimizations specific to these computations. Halide and PolyMage are two of the recent works in this direction, that aim to optimize image processing pipelines. Many computations like upsampling and downsampling an image are similar to interpolation and restriction in geometric multigrid methods.
In this thesis, we demonstrate how high performance can be achieved on GMG algorithms written in the PolyMage domain-specific language with new optimizations we added to the compiler. We also discuss the implementation of non-trivial optimizations, on PolyMage compiler, necessary to achieve high parallel performance for multigrid methods on modern architectures. We realize these goals by:
• introducing multigrid domain-specific constructs to minimize the verbosity of the algorithm specification;
• storage remapping to reduce the memory footprint of the program and improve cache locality exploitation;
• mitigating execution time spent in data handling operations like memory allocation and freeing, using a pool of memory, across multiple multigrid cycles; and
• incorporating other well-known techniques to leverage performance, like exploiting multi-dimensional parallelism and minimizing the lifetime of storage buffers.
We evaluate our optimizations on a modern multicore system using five different benchmarks varying in multigrid cycle structure, complexity and size, for two-and three-dimensional data grids. Experimental results show that our optimizations:
• improve performance of existing PolyMage optimizer by 1.31x;
• are better than straight-forward parallel and vector implementations by 3.2x;
• are better than hand-optimized versions in conjunction with optimizations by Pluto, a state-of-the-art polyhedral source-to-source optimizer, by 1.23x; and
• achieve up to 1.5$\times$ speedup over NAS MG benchmark from the NAS Parallel Benchmarks.
(The speedup numbers are Geometric means over all benchmarks)
|
2 |
Polymage : Automatic Optimization for Image Processing PipelinesMullapudi, Ravi Teja January 2015 (has links) (PDF)
Image processing pipelines are ubiquitous. Every image captured by a camera and every image uploaded on social networks like Google+or Facebook is processed by a pipeline. Applications in a wide range of domains like computational photography, computer vision and medical imaging use image processing pipelines. Many of these applications demand high-performance which requires effective utilization of modern architectures. Given the proliferation of camera enabled devices and social networks optimizing these emerging workloads has become important both at the data center and the embedded device scales.
An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, sam-pling, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective uti-lization of parallelism available on modern architectures. The traditional options are using optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.
Inthisthesis,wepresentthedesignandimplementationofPolyMage,adomain-specific language and compiler for image processing pipelines. The focus of the system is on au-tomatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. We achieve such automation with:
• tiling techniques to improve parallelism and locality by introducing redundant computation,
v
a model-driven fusion heuristic which enables a trade-off between locality and re-dundant computations, and anautotuner whichleveragesthefusionheuristictoexploreasmallsubsetofpipeline implementations and find the best performing one.
Our optimization approach primarily relies on the transformation and code generation ca-pabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization fully automatically. We evaluate our framework on a modern multicore system using a set of seven benchmarks which vary widely in structure and complexity. Experimental results show that the performance of pipeline implementations generated by our approach is:
• up to 1.81× better than pipeline implementations manually tuned using Halide, a state-of-the-art language and compiler for image processing pipelines,
• on average 5.39× better than pipeline implementations automatically tuned using Halide and OpenTuner, and
• on average 3.3× better than naive pipeline implementations which only exploit par-allelism without optimizing for locality.
We also demonstrate that the performance of PolyMage generated code is better or compa-rable to implementations using OpenCV, a state-of-the-art image processing and computer vision library.
|
Page generated in 0.0477 seconds