Spelling suggestions: "subject:"time lowrank"" "subject:"time blowrank""
1 |
Exploiting Data Sparsity In Covariance Matrix Computations on Heterogeneous SystemsCharara, Ali 24 May 2018 (has links)
Covariance matrices are ubiquitous in computational sciences, typically describing the correlation of elements of large multivariate spatial data sets. For example, covari- ance matrices are employed in climate/weather modeling for the maximum likelihood estimation to improve prediction, as well as in computational ground-based astronomy to enhance the observed image quality by filtering out noise produced by the adap- tive optics instruments and atmospheric turbulence. The structure of these covariance matrices is dense, symmetric, positive-definite, and often data-sparse, therefore, hier- archically of low-rank. This thesis investigates the performance limit of dense matrix computations (e.g., Cholesky factorization) on covariance matrix problems as the number of unknowns grows, and in the context of the aforementioned applications. We employ recursive formulations of some of the basic linear algebra subroutines (BLAS) to accelerate the covariance matrix computation further, while reducing data traffic across the memory subsystems layers. However, dealing with large data sets (i.e., covariance matrices of billions in size) can rapidly become prohibitive in memory footprint and algorithmic complexity. Most importantly, this thesis investigates the tile low-rank data format (TLR), a new compressed data structure and layout, which is valuable in exploiting data sparsity by approximating the operator. The TLR com- pressed data structure allows approximating the original problem up to user-defined numerical accuracy. This comes at the expense of dealing with tasks with much lower arithmetic intensities than traditional dense computations. In fact, this thesis con-
solidates the two trends of dense and data-sparse linear algebra for HPC. Not only does the thesis leverage recursive formulations for dense Cholesky-based matrix al- gorithms, but it also implements a novel TLR-Cholesky factorization using batched linear algebra operations to increase hardware occupancy and reduce the overhead of the API. Performance reported of the dense and TLR-Cholesky shows many-fold speedups against state-of-the-art implementations on various systems equipped with GPUs. Additionally, the TLR implementation gives the user flexibility to select the desired accuracy. This trade-off between performance and accuracy is, currently, a well-established leading trend in the convergence of the third and fourth paradigm, i.e., HPC and Big Data, when moving forward with exascale software roadmap.
|
2 |
Fast High-order Integral Equation Solvers for Acoustic and Electromagnetic Scattering ProblemsAlharthi, Noha 18 November 2019 (has links)
Acoustic and electromagnetic scattering from arbitrarily shaped structures can be numerically characterized by solving various surface integral equations (SIEs). One of the
most effective techniques to solve SIEs is the Nyström method. Compared to other existing methods,the Nyström method is easier to implement especially when the geometrical discretization is non-conforming and higher-order representations of the geometry and unknowns are desired. However,singularities of the Green’s function are more difficult to”manage”since they are not ”smoothened” through the use of a testing function.
This dissertation describes purely numerical schemes to account for different orders of
singularities that appear in acoustic and electromagnetic SIEs when they are solved by a high-order Nyström method utilizing a mesh of curved discretization elements. These schemes make use of two sets of basis functions to smoothen singular integrals: the grid robust high-order Lagrange and the high-order Silvester-Lagrange interpolation basis functions. Numerical results comparing the convergence of two schemes are presented.
Moreover, an extremely scalable implementation of fast multipole method (FMM) is developed to efficiently (and iteratively) solve the linear system resulting from the discretization of the acoustic SIEs by the Nyström method. The implementation results in O(N log N) complexity for high-frequency scattering problems. This FMM-accelerated solver can handle N =2 billion on a 200,000-core Cray XC40 with 85% strong scaling efficiency.
Iterative solvers are often ineffective for ill-conditioned problems. Thus, a fast direct (LU)solver,which makes use of low-rank matrix approximations,is also developed. This solver relies on tile low rank (TLR) data compression format, as implemented in the hierarchical computations on many corearchitectures (HiCMA) library. This requires to taskify the underlying SIE kernels to expose fine-grained computations. The resulting asynchronous execution permit to weaken the artifactual synchronization points,while mitigating the overhead of data motion. We compare the obtained performance results of our TLRLU factorization against the state-of-the-art dense factorizations on shared
memory systems. We achieve up to a fourfold performance speedup on a 3D acoustic problem with up to 150 K unknowns in double complex precision arithmetics.
|
Page generated in 0.0437 seconds