1 |
Transparent and Efficient I/O for Statistical ComputingZhang, Yi January 2012 (has links)
<p>Statistical analysis of massive array data is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating intermediate results. In the big data setting, storage and I/O efficiency is a key to efficient analytics. Because of the distinct characteristics of disk-resident arrays and the operations performed on them, we need a computing environment that is easy to use, scalable to big data, and different from traditional, CPU- and memory-centric solutions.</p><p>R is a popular computing environment for statistical/numerical data analysis. Like many such environments, R performs poorly for large datasets. This dissertation presents RIOT (R with I/O Transparency), a framework to make R programs I/O-efficient in a way transparent to users. RIOT-DB, an implementation of RIOT using a relational database system as its backend, significantly outperforms R in many big-data scenarios. RIOT users are insulated from the data management backend and I/O optimization specifics. Because of this transparency, RIOT is easy to adopt by the majority of the R users.</p><p>While RIOT-DB demonstrates the feasibility of transparent I/O efficiency and the potential of database-style inter-operator optimizations, it also reveals significant deficiencies of database systems in handling statistical computation. To improve the efficiency of array storage, RIOT uses a novel storage structure called Linearized-Array B-tree, or LAB-tree. LAB-tree supports flexible array layouts and automatically adapts to varying sparsity across parts of an array and over time. It also implements splitting strategies and update batching policies with good theoretical guarantees and/or practical performance.</p><p>While LAB-tree removes many I/O inefficiencies that arise in accessing individual arrays, programs consisting of multiple operators need further optimization. To this end, RIOT incorporates an I/O optimization framework, RIOTShare, which is able to jointly optimize I/O sharing and array layouts for a broad range of analysis tasks expressible in nested-loop forms. RIOTShare explores the middle ground between the high-level, database-style operator-based query optimization and low-level, compiler-style loop-based code optimization.</p><p>In sum, combining a transparent language binding mechanism, an efficient and flexible storage engine, and an accurate I/O sharing and array layout optimizer, RIOT provides a systematic solution for data-intensive array-based statistical computing.</p> / Dissertation
|
2 |
Automatic Optimization of Geometric Multigrid Methods using a DSL ApproachVasista, Vinay V January 2017 (has links) (PDF)
Geometric Multigrid (GMG) methods are widely used in numerical analysis to accelerate the convergence of partial differential equations solvers using a hierarchy of grid discretizations. These solvers find plenty of applications in various fields in engineering and scientific domains, where solving PDEs is of fundamental importance. Using multigrid methods, the pace at which the solvers arrive at the solution can be improved at an algorithmic level. With the advance in modern computer architecture, solving problems with higher complexity and sizes is feasible - this is also the case with multigrid methods. However, since hardware support alone cannot achieve high performance in execution time, there is a need for good software that help programmers in doing so.
Multiple grid sizes and recursive expression of multigrid cycles make the task of manual program optimization tedious and error-prone. A high-level language that aids domain experts to quickly express complex algorithms in a compact way using dedicated constructs for multigrid methods and with good optimization support is thus valuable. Typical computation patterns in a GMG algorithm includes stencils, point-wise accesses, restriction and interpolation of a grid. These computations can be optimized for performance on modern architectures using standard parallelization and locality enhancement techniques.
Several past works have addressed the problem of automatic optimizations of computations in various scientific domains using a domain-specific language (DSL) approach. A DSL is a language with features to express domain-specific computations and compiler support to enable optimizations specific to these computations. Halide and PolyMage are two of the recent works in this direction, that aim to optimize image processing pipelines. Many computations like upsampling and downsampling an image are similar to interpolation and restriction in geometric multigrid methods.
In this thesis, we demonstrate how high performance can be achieved on GMG algorithms written in the PolyMage domain-specific language with new optimizations we added to the compiler. We also discuss the implementation of non-trivial optimizations, on PolyMage compiler, necessary to achieve high parallel performance for multigrid methods on modern architectures. We realize these goals by:
• introducing multigrid domain-specific constructs to minimize the verbosity of the algorithm specification;
• storage remapping to reduce the memory footprint of the program and improve cache locality exploitation;
• mitigating execution time spent in data handling operations like memory allocation and freeing, using a pool of memory, across multiple multigrid cycles; and
• incorporating other well-known techniques to leverage performance, like exploiting multi-dimensional parallelism and minimizing the lifetime of storage buffers.
We evaluate our optimizations on a modern multicore system using five different benchmarks varying in multigrid cycle structure, complexity and size, for two-and three-dimensional data grids. Experimental results show that our optimizations:
• improve performance of existing PolyMage optimizer by 1.31x;
• are better than straight-forward parallel and vector implementations by 3.2x;
• are better than hand-optimized versions in conjunction with optimizations by Pluto, a state-of-the-art polyhedral source-to-source optimizer, by 1.23x; and
• achieve up to 1.5$\times$ speedup over NAS MG benchmark from the NAS Parallel Benchmarks.
(The speedup numbers are Geometric means over all benchmarks)
|
Page generated in 0.2384 seconds