1 |
Distributed Execution of Recursive Irregular ApplicationsNikhil Hegde (7043171) 13 August 2019 (has links)
Massive computing power and applications running on this power, primarily confined to expensive supercomputers a decade ago, have now become mainstream through the availability of clusters with commodity computers and high-speed interconnects running big-data era applications. The challenges associated with programming such systems, for effectively utilizing the computing power, have led to the creation of intuitive abstractions and implementations targeting average users, domain experts, and savvy (parallel) programmers. There is often a trade-off between the ease of programming and performance when using these abstractions. This thesis develops tools to bridge the gap between ease of programming and performance of irregular programs—programs that involve one or more of irregular- data structures, control structures, and communication patterns—on distributed-memory systems. <div><br></div><div>Irregular programs are focused heavily in domains ranging from data mining to bioinformatics to scientific computing. In contrast to regular applications such as stencil codes and dense matrix-matrix multiplications, which have a predictable pattern of data access and control flow, typical irregular applications operate over graphs, trees, and sparse matrices and involve input-dependent data access pattern and control flow. This makes it difficult to apply optimizations such as those targeting locality and parallelism to programs implementing irregular applications. Moreover, irregular programs are often used with large data sets that prohibit single-node execution due to memory limitations on the node. Hence, distributed solutions are necessary in order to process all the data.<br><br>In this thesis, we introduce SPIRIT, a framework consisting of an abstraction and a space-adaptive runtime system for simplifying the creation of distributed implementations of recursive irregular programs based on spatial acceleration structures. SPIRIT addresses the insufficiency of traditional data-parallel approaches and existing systems in effectively parallelizing computations involving repeated tree traversals. SPIRIT employs locality optimizations applied in a shared-memory context, introduces a novel pipeline-parallel approach to execute distributed traversals, and trades-off performance with memory usage to create a space-adaptive system that achieves a scalable performance, and outperforms implementations done in contemporary distributed graph processing frameworks.</div><div><br>We next introduce Treelogy to understand the connection between optimizations and tree-algorithms. Treelogy provides an ontology and a benchmark suite of a broader class of tree algorithms to help answer: (i) is there any existing optimization that is applicable or effective for a new tree algorithm? (ii) can a new optimization developed for a tree algorithm be applied to existing tree algorithms from other domains? We show that a categorization (ontology) based on structural properties of tree- algorithms is useful for both developers of new optimizations and new tree algorithm creators. With the help of a suite of tree traversal kernels spanning the ontology, we show that GPU, shared-, and distributed-memory implementations are scalable and the two-point correlation algorithm with vptree performs better than the standard kdtree implementation.</div><div><br>In the final part of the thesis, we explore the possibility of automatically generating efficient distributed-memory implementations of irregular programs. As manually creating distributed-memory implementations is challenging due to the explicit need for managing tasks, parallelism, communication, and load-balancing, we introduce a framework, D2P, to automatically generate efficient distributed implementations of recursive divide-conquer algorithms. D2P automatically generates a distributed implementation of a recursive divide-conquer algorithm from its specification, which is a high-level outline of a recursive formulation. We evaluate D2P with recursive Dynamic programming (DP) algorithms. The computation in DP algorithms is not irregular per se. However, when distributed, the computation in efficient recursive formulations of DP algorithms requires irregular communication. User-configurable knobs in D2P allow for tuning the amount of available parallelism. Results show that D2P programs scale well, are significantly better than those produced using a state-of-the-art framework for parallelizing iterative DP algorithms, and outperform even hand-written distributed-memory implementations in most cases.</div>
|
2 |
Parallélisation automatique et statique de tâches sous contraintes de ressources : une approche générique / Automatic Resource-Constrained Static Task Parallelization : A Generic ApproachKhaldi, Dounia 27 November 2013 (has links)
Le but de cette thèse est d'exploiter efficacement le parallélisme présent dans les applications informatiques séquentielles afin de bénéficier des performances fournies par les multiprocesseurs, en utilisant une nouvelle méthodologie pour la parallélisation automatique des tâches au sein des compilateurs. Les caractéristiques clés de notre approche sont la prise en compte des contraintes de ressources et le caractère statique de l'ordonnancement des tâches. Notre méthodologie contient les techniques nécessaires pour la décomposition des applications en tâches et la génération de code parallèle équivalent, en utilisant une approche générique qui vise différents langages et architectures parallèles. Nous implémentons cette méthodologie dans le compilateur source-à-source PIPS. Cette thèse répond principalement à trois questions. Primo, comme l'extraction du parallélisme de tâches des codes séquentiels est un problème d'ordonnancement, nous concevons et implémentons un algorithme d'ordonnancement efficace, que nous nommons BDSC, pour la détection du parallélisme ; le résultat est un SDG ordonnancé, qui est une nouvelle structure de données de graphe de tâches. Secondo, nous proposons une nouvelle extension générique des représentations intermédiaires séquentielles en des représentations intermédiaires parallèles que nous nommons SPIRE, pour la représentation des codes parallèles. Enfin, nous développons, en utilisant BDSC et SPIRE, un générateur de code que nous intégrons dans PIPS. Ce générateur de code cible les systèmes à mémoire partagée et à mémoire distribuée via des codes OpenMP et MPI générés automatiquement. / This thesis intends to show how to efficiently exploit the parallelism present in applications in order to enjoy the performance benefits that multiprocessors can provide, using a new automatic task parallelization methodology for compilers. The key characteristics we focus on are resource constraints and static scheduling. This methodology includes the techniques required to decompose applications into tasks and generate equivalent parallel code, using a generic approach that targets both different parallel languages and architectures. We apply this methodology in the existing tool PIPS, a comprehensive source-to-source compilation platform. This thesis mainly focuses on three issues. First, since extracting task parallelism from sequential codes is a scheduling problem, we design and implement an efficient, automatic scheduling algorithm called BDSC for parallelism detection; the result is a scheduled SDG, a new task graph data structure. In a second step, we design a new generic parallel intermediate representation extension called SPIRE, in which parallelized code may be expressed. Finally, we wrap up our goal of automatic parallelization in a new BDSC- and SPIRE-based parallel code generator, which is integrated within the PIPS compiler framework. It targets both shared and distributed memory systems using automatically generated OpenMP and MPI code.
|
3 |
Seismic modeling and imaging with Fourier method : numerical analyses and parallel implementation strategiesChu, Chunlei, 1977- 13 June 2011 (has links)
Our knowledge of elastic wave propagation in general heterogeneous media with complex geological structures comes principally from numerical simulations. In this dissertation, I demonstrate through rigorous theoretical analyses and comprehensive numerical experiments that the Fourier method is a suitable method of choice for large scale 3D seismic modeling and imaging problems, due to its high accuracy and computational efficiency. The most attractive feature of the Fourier method is its ability to produce highly accurate solutions on relatively coarser grids, compared with other numerical methods for solving wave equations. To further advance the Fourier method, I identify two aspects of the method to focus on in this work, i.e., its implementation on modern clusters of computers and efficient high-order time stepping schemes. I propose two new parallel algorithms to improve the efficiency of the Fourier method on distributed memory systems using MPI. The first algorithm employs non-blocking all-to-all communications to optimize the conventional parallel Fourier modeling workflows by overlapping communication with computation. With a carefully designed communication-computation overlapping mechanism, a large amount of communication overhead can be concealed when implementing different kinds of wave equations. The second algorithm combines the advantages of both the Fourier method and the finite difference method by using convolutional high-order finite difference operators to evaluate the spatial derivatives in the decomposed direction. The high-order convolutional finite difference method guarantees a satisfactory accuracy and provides the flexibility of using non-blocking point-to-point communications for efficient interprocessor data exchange and the possibility of overlapping communication and computation. As a result, this hybrid method achieves an optimized balance between numerical accuracy and computational efficiency. To improve the overall accuracy of time domain Fourier simulations, I propose a family of new high-order time stepping schemes, based on a novel algorithm for designing time integration operators, to reduce temporal derivative discretization errors in a cost-effective fashion. I explore the pseudo-analytical method and propose high-order formulations to further improve its accuracy and ability to deal with spatial heterogeneities. I also extend the pseudo-analytical method to solve the variable-density acoustic and elastic wave equations. I thoroughly examine the finite difference method by conducting complete numerical dispersion and stability analyses. I comprehensively compare the finite difference method with the Fourier method and provide a series of detailed benchmarking tests of these two methods under a number of different simulation configurations. The Fourier method outperforms the finite difference method, in terms of both accuracy and efficiency, for both the theoretical studies and the numerical experiments, which provides solid evidence that the Fourier method is a superior scheme for large scale seismic modeling and imaging problems. / text
|
Page generated in 0.3316 seconds