1 |
Scientific Computing on Multicore ArchitecturesTillenius, Martin January 2014 (has links)
Computer simulations are an indispensable tool for scientists to gain new insights about nature. Simulations of natural phenomena are usually large, and limited by the available computer resources. By using the computer resources more efficiently, larger and more detailed simulations can be performed, and more information can be extracted to help advance human knowledge. The topic of this thesis is how to make best use of modern computers for scientific computations. The challenge here is the high level of parallelism that is required to fully utilize the multicore processors in these systems. Starting from the basics, the primitives for synchronizing between threads are investigated. Hardware transactional memory is a new construct for this, which is evaluated for a new use of importance for scientific software: atomic updates of floating point values. The evaluation includes experiments on real hardware and comparisons against standard methods. Higher level programming models for shared memory parallelism are then considered. The state of the art for efficient use of multicore systems is dynamically scheduled task-based systems, where tasks can depend on data. In such systems, the software is divided up into many small tasks that are scheduled asynchronously according to their data dependencies. This enables a high level of parallelism, and avoids global barriers. A new system for managing task dependencies is developed in this thesis, based on data versioning. The system is implemented as a reusable software library, and shown to be as efficient or more efficient than other shared-memory task-based systems in experimental comparisons. The developed runtime system is then extended to distributed memory machines, and used for implementing a parallel version of a software for global climate simulations. By running the optimized and parallelized version on eight servers, an equally sized problem can be solved over 100 times faster than in the original sequential version. The parallel version also allowed significantly larger problems to be solved, previously unreachable due to memory constraints. / UPMARC / eSSENCE
|
2 |
A distributed kernel summation framework for machine learning and scientific applicationsLee, Dong Ryeol 11 May 2012 (has links)
The class of computational problems I consider in
this thesis share the common trait of requiring
consideration of pairs (or higher-order tuples)
of data points. I focus on the problem of kernel
summation operations ubiquitous in many data
mining and scientific algorithms.
In machine learning, kernel summations appear in
popular kernel methods which can model nonlinear
structures in data. Kernel methods include many
non-parametric methods such as kernel density
estimation, kernel regression, Gaussian process
regression, kernel PCA, and kernel support vector
machines (SVM). In computational physics,
kernel summations occur inside the classical
N-body problem for simulating positions of a set
of celestial bodies or atoms.
This thesis attempts to marry, for the first
time, the best relevant techniques in parallel
computing, where kernel summations are in low
dimensions, with the best general-dimension
algorithms from the machine learning literature.
We provide a unified, efficient parallel
kernel summation framework that can utilize:
(1) various types of deterministic and
probabilistic approximations that may be
suitable for both low and high-dimensional
problems with a large number of data points;
(2) indexing the data using any multi-dimensional
binary tree with both distributed memory (MPI)
and shared memory (OpenMP/Intel TBB) parallelism;
(3) a dynamic load balancing scheme to adjust
work imbalances during the computation.
I will first summarize my previous research in
serial kernel summation algorithms. This work
started from Greengard/Rokhlin's earlier work on
fast multipole methods for the purpose of
approximating potential sums of many particles.
The contributions of this part of this thesis
include the followings: (1) reinterpretation of
Greengard/Rokhlin's work for the computer science
community; (2) the extension of the algorithms to
use a larger class of approximation strategies,
i.e. probabilistic error bounds via Monte Carlo
techniques; (3) the multibody series expansion:
the generalization of the theory of fast
multipole methods to handle interactions of more
than two entities; (4) the first O(N) proof of
the batch approximate kernel summation using a
notion of intrinsic dimensionality. Then I move
onto the problem of parallelization of the kernel
summations and tackling the scaling of two other
kernel methods, Gaussian process regression
(kernel matrix inversion) and kernel PCA (kernel
matrix eigendecomposition).
The artifact of this thesis has contributed to an
open-source machine learning package called
MLPACK which has been first demonstrated at the
NIPS 2008 and subsequently at the NIPS 2011 Big
Learning Workshop. Completing a portion of this
thesis involved utilization of high performance
computing resource at XSEDE (eXtreme Science and
Engineering Discovery Environment) and NERSC
(National Energy Research Scientific Computing
Center).
|
Page generated in 0.1294 seconds