Global ETD Search

1	Exploiting Data Sparsity In Covariance Matrix Computations on Heterogeneous Systems Charara, Ali 24 May 2018 (has links) Covariance matrices are ubiquitous in computational sciences, typically describing the correlation of elements of large multivariate spatial data sets. For example, covari- ance matrices are employed in climate/weather modeling for the maximum likelihood estimation to improve prediction, as well as in computational ground-based astronomy to enhance the observed image quality by filtering out noise produced by the adap- tive optics instruments and atmospheric turbulence. The structure of these covariance matrices is dense, symmetric, positive-definite, and often data-sparse, therefore, hier- archically of low-rank. This thesis investigates the performance limit of dense matrix computations (e.g., Cholesky factorization) on covariance matrix problems as the number of unknowns grows, and in the context of the aforementioned applications. We employ recursive formulations of some of the basic linear algebra subroutines (BLAS) to accelerate the covariance matrix computation further, while reducing data traffic across the memory subsystems layers. However, dealing with large data sets (i.e., covariance matrices of billions in size) can rapidly become prohibitive in memory footprint and algorithmic complexity. Most importantly, this thesis investigates the tile low-rank data format (TLR), a new compressed data structure and layout, which is valuable in exploiting data sparsity by approximating the operator. The TLR com- pressed data structure allows approximating the original problem up to user-defined numerical accuracy. This comes at the expense of dealing with tasks with much lower arithmetic intensities than traditional dense computations. In fact, this thesis con- solidates the two trends of dense and data-sparse linear algebra for HPC. Not only does the thesis leverage recursive formulations for dense Cholesky-based matrix al- gorithms, but it also implements a novel TLR-Cholesky factorization using batched linear algebra operations to increase hardware occupancy and reduce the overhead of the API. Performance reported of the dense and TLR-Cholesky shows many-fold speedups against state-of-the-art implementations on various systems equipped with GPUs. Additionally, the TLR implementation gives the user flexibility to select the desired accuracy. This trade-off between performance and accuracy is, currently, a well-established leading trend in the convergence of the third and fourth paradigm, i.e., HPC and Big Data, when moving forward with exascale software roadmap. data sparse Hierarchical covariance matrix GPU tile low-rank Dense Linear Algebra
2	Fast, Parallel Techniques for Time-Domain Boundary Integral Equations Kachanovska, Maryna 27 January 2014 (has links) (PDF) This work addresses the question of the efficient numerical solution of time-domain boundary integral equations with retarded potentials arising in the problems of acoustic and electromagnetic scattering. The convolutional form of the time-domain boundary operators allows to discretize them with the help of Runge-Kutta convolution quadrature. This method combines Laplace-transform and time-stepping approaches and requires the explicit form of the fundamental solution only in the Laplace domain to be known. Recent numerical and analytical studies revealed excellent properties of Runge-Kutta convolution quadrature, e.g. high convergence order, stability, low dissipation and dispersion. As a model problem, we consider the wave scattering in three dimensions. The convolution quadrature discretization of the indirect formulation for the three-dimensional wave equation leads to the lower triangular Toeplitz system of equations. Each entry of this system is a boundary integral operator with a kernel defined by convolution quadrature. In this work we develop an efficient method of almost linear complexity for the solution of this system based on the existing recursive algorithm. The latter requires the construction of many discretizations of the Helmholtz boundary single layer operator for a wide range of complex wavenumbers. This leads to two main problems: the need to construct many dense matrices and to evaluate many singular and near-singular integrals. The first problem is overcome by the use of data-sparse techniques, namely, the high-frequency fast multipole method (HF FMM) and H-matrices. The applicability of both techniques for the discretization of the Helmholtz boundary single-layer operators with complex wavenumbers is analyzed. It is shown that the presence of decay can favorably affect the length of the fast multipole expansions and thus reduce the matrix-vector multiplication times. The performance of H-matrices and the HF FMM is compared for a range of complex wavenumbers, and the strategy to choose between two techniques is suggested. The second problem, namely, the assembly of many singular and nearly-singular integrals, is solved by the use of the Huygens principle. In this work we prove that kernels of the boundary integral operators $w_n^h(d)$ ($h$ is the time step and $t_n=nh$ is the time) exhibit exponential decay outside of the neighborhood of $d=nh$ (this is the consequence of the Huygens principle). The size of the support of these kernels for fixed $h$ increases with $n$ as $n^a,a<1$, where $a$ depends on the order of the Runge-Kutta method and is (typically) smaller for Runge-Kutta methods of higher order. Numerical experiments demonstrate that theoretically predicted values of $a$ are quite close to optimal. In the work it is shown how this property can be used in the recursive algorithm to construct only a few matrices with the near-field, while for the rest of the matrices the far-field only is assembled. The resulting method allows to solve the three-dimensional wave scattering problem with asymptotically almost linear complexity. The efficiency of the approach is confirmed by extensive numerical experiments. Wellengleichung retardierte Potential Randelementmethode Zeitbereichs-Randintegralgleichung Faltungsquadratur Runge-Kutta-Verfahren hierarchische Matrizen Fast Multipole Methoden wave equation retarder potential boundary element method time-domain boundary integral equation convolution quadrature Runge-Kutta method hierarchical matrices fast multipole methods data-sparse techniques ddc:500
3	Fast, Parallel Techniques for Time-Domain Boundary Integral Equations Kachanovska, Maryna 15 January 2014 (has links) This work addresses the question of the efficient numerical solution of time-domain boundary integral equations with retarded potentials arising in the problems of acoustic and electromagnetic scattering. The convolutional form of the time-domain boundary operators allows to discretize them with the help of Runge-Kutta convolution quadrature. This method combines Laplace-transform and time-stepping approaches and requires the explicit form of the fundamental solution only in the Laplace domain to be known. Recent numerical and analytical studies revealed excellent properties of Runge-Kutta convolution quadrature, e.g. high convergence order, stability, low dissipation and dispersion. As a model problem, we consider the wave scattering in three dimensions. The convolution quadrature discretization of the indirect formulation for the three-dimensional wave equation leads to the lower triangular Toeplitz system of equations. Each entry of this system is a boundary integral operator with a kernel defined by convolution quadrature. In this work we develop an efficient method of almost linear complexity for the solution of this system based on the existing recursive algorithm. The latter requires the construction of many discretizations of the Helmholtz boundary single layer operator for a wide range of complex wavenumbers. This leads to two main problems: the need to construct many dense matrices and to evaluate many singular and near-singular integrals. The first problem is overcome by the use of data-sparse techniques, namely, the high-frequency fast multipole method (HF FMM) and H-matrices. The applicability of both techniques for the discretization of the Helmholtz boundary single-layer operators with complex wavenumbers is analyzed. It is shown that the presence of decay can favorably affect the length of the fast multipole expansions and thus reduce the matrix-vector multiplication times. The performance of H-matrices and the HF FMM is compared for a range of complex wavenumbers, and the strategy to choose between two techniques is suggested. The second problem, namely, the assembly of many singular and nearly-singular integrals, is solved by the use of the Huygens principle. In this work we prove that kernels of the boundary integral operators $w_n^h(d)$ ($h$ is the time step and $t_n=nh$ is the time) exhibit exponential decay outside of the neighborhood of $d=nh$ (this is the consequence of the Huygens principle). The size of the support of these kernels for fixed $h$ increases with $n$ as $n^a,a<1$, where $a$ depends on the order of the Runge-Kutta method and is (typically) smaller for Runge-Kutta methods of higher order. Numerical experiments demonstrate that theoretically predicted values of $a$ are quite close to optimal. In the work it is shown how this property can be used in the recursive algorithm to construct only a few matrices with the near-field, while for the rest of the matrices the far-field only is assembled. The resulting method allows to solve the three-dimensional wave scattering problem with asymptotically almost linear complexity. The efficiency of the approach is confirmed by extensive numerical experiments. info:eu-repo/classification/ddc/500 ddc:500

1

Page generated in 0.0665 seconds