• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 4
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 36
  • 36
  • 8
  • 8
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Ahmed, Salman, Houser, Jennifer, Hoque, Mohammad A., Raju, Rezaul, Pfeiffer, Phil 01 July 2017 (has links)
Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results.
12

Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systems

Adavally, Shashank 12 1900 (has links)
CPU-memory bottleneck is a widely recognized problem. It is known that majority of high performance computing (HPC) database systems are configured with large memories and dedicated to process specific workloads like weather prediction, molecular dynamic simulations etc. My research on optimal address mapping improves the memory performance by increasing the channel and bank level parallelism. In an another research direction, I proposed and evaluated adaptive page migration techniques that obviates the need for offline analysis of an application to determine page migration strategies. Furthermore, I explored different migration strategies like reverse migration, sub page migration that I found to be beneficial depending on the application behavior. Ideally, page migration strategies redirect the demand memory traffic to faster memory to improve the memory performance. In my third contribution, I worked and evaluated a memory-side accelerator to assist the main computational core in locating the non-zero elements of a sparse matrix that are typically used in scientific, machine learning workloads on a low-power embedded system configuration. Thus my contributions narrow the speed-gap by improving the latency and/or bandwidth between CPU and memory.
13

Direct and Line Based Iterative Methods for Solving Sparse Block Linear Systems

Yang, Xiaolin January 2018 (has links)
No description available.
14

Rigid Partitioning Techniques for Efficiently Generating 3D Reconstructions from Images

Steedly, Drew 01 December 2004 (has links)
This thesis explores efficient techniques for generating 3D reconstructions from imagery. Non-linear optimization is one of the core techniques used when computing a reconstruction and is a computational bottleneck for large sets of images. Since non-linear optimization requires a good initialization to avoid getting stuck in local minima, robust systems for generating reconstructions from images build up the reconstruction incrementally. A hierarchical approach is to split up the images into small subsets, reconstruct each subset independently and then hierarchically merge the subsets. Rigidly locking together portions of the reconstructions reduces the number of parameters needed to represent them when merging, thereby lowering the computational cost of the optimization. We present two techniques that involve optimizing with parts of the reconstruction rigidly locked together. In the first, we start by rigidly grouping the cameras and scene features from each of the reconstructions being merged into separate groups. Cameras and scene features are then incrementally unlocked and optimized until the reconstruction is close to the minimum energy. This technique is most effective when the influence of the new measurements is restricted to a small set of parameters. Measurements that stitch together weakly coupled portions of the reconstruction, though, tend to cause deformations in the low error modes of the reconstruction and cannot be efficiently incorporated with the previous technique. To address this, we present a spectral technique for clustering the tightly coupled portions of a reconstruction into rigid groups. Reconstructions partitioned in this manner can closely mimic the poorly conditioned, low error modes, and therefore efficiently incorporate measurements that stitch together weakly coupled portions of the reconstruction. We explain how this technique can be used to scalably and efficiently generate reconstructions from large sets of images.
15

Sparse Matrices in Self-Consistent Field Methods

Rubensson, Emanuel January 2006 (has links)
<p>This thesis is part of an effort to enable large-scale Hartree-Fock/Kohn-Sham (HF/KS) calculations. The objective is to model molecules and materials containing thousands of atoms at the quantum mechanical level. HF/KS calculations are usually performed with the Self-Consistent Field (SCF) method. This method involves two computationally intensive steps. These steps are the construction of the Fock/Kohn-Sham potential matrix from a given electron density and the subsequent update of the electron density usually represented by the so-called density matrix. In this thesis the focus lies on the representation of potentials and electron density and on the density matrix construction step in the SCF method. Traditionally a diagonalization has been used for the construction of the density matrix. This diagonalization method is, however, not appropriate for large systems since the time complexity for this operation is σ(n<sup>3</sup>). Three types of alternative methods are described in this thesis; energy minimization, Chebyshev expansion, and density matrix purification. The efficiency of these methods relies on fast matrix-matrix multiplication. Since the occurring matrices become sparse when the separation between atoms exceeds some value, the matrix-matrix multiplication can be performed with complexity σ(n).</p><p>A hierarchic sparse matrix data structure is proposed for the storage and manipulation of matrices. This data structure allows for easy development and implementation of algebraic matrix operations, particularly needed for the density matrix construction, but also for other parts of the SCF calculation. The thesis addresses also truncation of small elements to enforce sparsity, permutation and blocking of matrices, and furthermore calculation of the HOMO-LUMO gap and a few surrounding eigenpairs when density matrix purification is used instead of the traditional diagonalization method.</p>
16

The Use Of Wavelet Type Basis Functions In The Mom Analysis Of Microstrip Structures

Cakir, Emre 01 December 2004 (has links) (PDF)
The Method of Moments (MoM) has been used extensively to solve electromagnetic problems. Its popularity is largely attributed to its adaptability to structures with various shapes and success in predicting the equivalent induced currents accurately. However, due to its dense matrix, especially for large structures, the MoM suffers from long matrix solution time and large storage requirement. In this thesis it is shown that use of wavelet basis functions result in a MoM matrix which is sparser than the one obtained by using traditional basis functions. A new wavelet system, different from the ones found in literature, is proposed. Stabilized Bi-Conjugate Gradient Method which is an iterative matrix solution method is utilized to solve the resulting sparse matrix equation. Both a one-dimensional problem with a microstrip line example and a two-dimensional problem with a rectangular patch antenna example are studied and the results are compared.
17

Sparse-Matrix support for the SkePU library for portable CPU/GPU programming

Sharma, Vishist January 2016 (has links)
In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the number of zeroes is relatively large then a requirement for more efficient data structure arises. Beyond the sparse matrix representation, we propose an algorithm to judge the condition where computation on sparse matrices is more beneficial in terms of execution time for an ongoing computation and to adapt a matrix's state accordingly, which is the main concern of this thesis work. We present and implement an approach to switch automatically between two data container types dynamically inside the SkePU framework for a multi-core GPU-based heterogeneous system. The new sparse matrix data container supports all SkePU skeletons and nearly all SkePU operations. We provide compression and decompression algorithms from dense matrix to sparse matrix and vice versa on CPU and GPUs using SkePU data parallel skeletons. We have also implemented a context aware switching mechanism in order to switch between two data container types on the CPU or the GPU. A multi-state matrix representation, and selection on demand is also made possible. In order to evaluate and test effectiveness and efficiency of our extension to the SkePU framework, we have considered Matrix-Vector Multiplication as our benchmark program because iterative solvers like Conjugate Gradient and Generalized Minimum Residual use Sparse Matrix-Vector Multiplication as their basic operation. Through our benchmark program we have demonstrated adaptive switching between two data container types, implementation selection between CUDA and OpenMP, and converting the data structure depending on the density of non-zeroes in a matrix. Our experiments on GPU-based architectures show that our automatic switching mechanism adapts with the fastest SkePU implementation variant, and has a limited training cost.
18

Performance of parallel sparse matrix-matrixmultiplication

Piccolo, Alessandro, Soodla, Johan January 2015 (has links)
No description available.
19

Sparse Matrices in Self-Consistent Field Methods

Rubensson, Emanuel January 2006 (has links)
This thesis is part of an effort to enable large-scale Hartree-Fock/Kohn-Sham (HF/KS) calculations. The objective is to model molecules and materials containing thousands of atoms at the quantum mechanical level. HF/KS calculations are usually performed with the Self-Consistent Field (SCF) method. This method involves two computationally intensive steps. These steps are the construction of the Fock/Kohn-Sham potential matrix from a given electron density and the subsequent update of the electron density usually represented by the so-called density matrix. In this thesis the focus lies on the representation of potentials and electron density and on the density matrix construction step in the SCF method. Traditionally a diagonalization has been used for the construction of the density matrix. This diagonalization method is, however, not appropriate for large systems since the time complexity for this operation is σ(n3). Three types of alternative methods are described in this thesis; energy minimization, Chebyshev expansion, and density matrix purification. The efficiency of these methods relies on fast matrix-matrix multiplication. Since the occurring matrices become sparse when the separation between atoms exceeds some value, the matrix-matrix multiplication can be performed with complexity σ(n). A hierarchic sparse matrix data structure is proposed for the storage and manipulation of matrices. This data structure allows for easy development and implementation of algebraic matrix operations, particularly needed for the density matrix construction, but also for other parts of the SCF calculation. The thesis addresses also truncation of small elements to enforce sparsity, permutation and blocking of matrices, and furthermore calculation of the HOMO-LUMO gap and a few surrounding eigenpairs when density matrix purification is used instead of the traditional diagonalization method. / <p>QC 20101123</p>
20

On the Field of Values of the Inverse of a Matrix

Zachlin, Paul Francis 08 June 2007 (has links)
No description available.

Page generated in 0.0484 seconds