• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Systolic algorithms and applications

Wan, Chunru January 1996 (has links)
The computer performance has been improved tremendously since the development of the first allpurpose, all electronic digital computer in 1946. However, engineers, scientists and researchers keep making more efforts to further improve the computer performance to meet the demanding requirements for many applications. There are basically two ways to improve the computer performance in terms of computational speed. One way is to use faster devices (VLSI chips). Although faster and faster VLSI components have contributed a great deal on the improvement of computation speed, the breakthroughs in increasing switching speed and circuit densities of VLSI devices will be diflicult and costly in future. The other way is to use parallel processing architectures which employ multiple processors to perform a computation task. When multiple processors working together, an appropriate architecture is very important to achieve the maximum performance in a cost-effective manner. Systolic arrays are ideally qualified for computationally intensive applications with inherent massive parallelism because they capitalize on regular, modular, rhythmic, synchronous, concurrent processes that require intensive, repetitive computation. This thesis can be divided into three parts. The first part is an introductory part containing Chap. I and Chap. 2. The second part, composed of Chap. 3 and Chap. 4 concerns with the systolic design methodology. The third part deals with the several systolic array design for different applications.
2

<b>FAST ALGORITHMS FOR MATRIX COMPUTATION AND APPLICATIONS</b>

Qiyuan Pang (17565405) 10 December 2023 (has links)
<p dir="ltr">Matrix decompositions play a pivotal role in matrix computation and applications. While general dense matrix-vector multiplications and linear equation solvers are prohibitively expensive, matrix decompositions offer fast alternatives for matrices meeting specific properties. This dissertation delves into my contributions to two fast matrix multiplication algorithms and one fast linear equation solver algorithm tailored for certain matrices and applications, all based on efficient matrix decompositions. Fast dimensionality reduction methods in spectral clustering, based on efficient eigen-decompositions, are also explored.</p><p dir="ltr">The first matrix decomposition introduced is the "kernel-independent" interpolative decomposition butterfly factorization (IDBF), acting as a data-sparse approximation for matrices adhering to a complementary low-rank property. Constructible in $O(N\log N)$ operations for an $N \times N$ matrix via hierarchical interpolative decompositions (IDs), the IDBF results in a product of $O(\log N)$ sparse matrices, each with $O(N)$ non-zero entries. This factorization facilitates rapid matrix-vector multiplication in $O(N \log N)$ operations, making it a versatile framework applicable to various scenarios like special function transformation, Fourier integral operators, and high-frequency wave computation.</p><p dir="ltr">The second matrix decomposition accelerates matrix-vector multiplication for computing multi-dimensional Jacobi polynomial transforms. Leveraging the observation that solutions to Jacobi's differential equation can be represented through non-oscillatory phase and amplitude functions, the corresponding matrix is expressed as the Hadamard product of a numerically low-rank matrix and a multi-dimensional discrete Fourier transform (DFT) matrix. This approach utilizes $r^d$ fast Fourier transforms (FFTs), where $r = O(\log n / \log \log n)$ and $d$ is the dimension, resulting in an almost optimal algorithm for computing the multidimensional Jacobi polynomial transform.</p><p dir="ltr">An efficient numerical method is developed based on a matrix decomposition, Hierarchical Interpolative Factorization, for solving modified Poisson-Boltzmann (MPB) equations. Addressing the computational bottleneck of evaluating Green's function in the MPB solver, the proposed method achieves linear scaling by combining selected inversion and hierarchical interpolative factorization. This innovation significantly reduces the computational cost associated with solving MPB equations, particularly in the evaluation of Green's function.</p><p dir="ltr"><br></p><p dir="ltr">Finally, eigen-decomposition methods, including the block Chebyshev-Davidson method and Orthogonalization-Free methods, are proposed for dimensionality reduction in spectral clustering. By leveraging well-known spectrum bounds of a Laplacian matrix, the Chebyshev-Davidson methods allow dimensionality reduction without the need for spectrum bounds estimation. And instead of the vanilla Chebyshev-Davidson method, it is better to use the block Chebyshev-Davidson method with an inner-outer restart technique to reduce total CPU time and a progressive polynomial filter to take advantage of suitable initial vectors when available, for example, in the streaming graph scenario. Theoretically, the Orthogonalization-Free method constructs a unitary isomorphic space to the eigenspace or a space weighting the eigenspace, solving optimization problems through Gradient Descent with Momentum Acceleration based on Conjugate Gradient and Line Search for optimal step sizes. Numerical results indicate that the eigenspace and the weighted eigenspace are equivalent in clustering performance, and scalable parallel versions of the block Chebyshev-Davidson method and OFM are developed to enhance efficiency in parallel computing.</p>

Page generated in 0.1336 seconds