Global ETD Search

1	Optimal Loop Unrolling for GPGPU Programs Sreenivasa Murthy, Giridhar 30 September 2009 (has links) No description available. Computer Science GPGPU Compiler Optimizations Loop Transformations Loop Unrolling
2	Integrated compiler optimizations for tensor contractions Gao, Xiaoyang 07 January 2008 (has links) No description available. Computer Science Compiler optimization Loop transformations High-performance computing Parallel algorithms Out-of-core algorithms
3	Efficient search-based strategies for polyhedral compilation : algorithms and experience in a production compiler Trifunovic, Konrad 04 July 2011 (has links) (PDF) In order to take the performance advantages of the current multicore and heterogeneous architectures the compilers are required to perform more and more complex program transformations. The search space of the possible program optimizations is huge and unstructured. Selecting the best transformation and predicting the potential performance benefits of that transformation is the major problem in today's optimizing compilers. The promising approach to handling the program optimizations is to focus on the automatic loop optimizations expressed in the polyhedral model. The current approaches for optimizing programs in the polyhedral model broadly fall into two classes. The first class of the methods is based on the linear optimization of the analytical cost function. The second class is based on the exhaustive iterative search. While the first approach is fast, it can easily miss the optimal solution. The iterative approach is more precise, but its running time might be prohibitively expensive. In this thesis we present a novel search-based approach to program transformations in the polyhedral model. The new method combines the benefits - effectiveness and precision - of the current approaches, while it tries to minimize their drawbacks. Our approach is based on enumerating the evaluations of the precise, nonlinear performance predicting cost-function. The current practice is to use the polyhedral model in the context of source-to-source compilers. We have implemented our techniques in a GCC framework that is based on the low level three address code representation. We show that the chosen level of abstraction for the intermediate representation poses scalability challenges, and we show the ways to overcome those problems. On the other hand, it is shown that the low level IR abstraction opens new degrees of freedom that are beneficial for the search-based transformation strategies and for the polyhedral compilation in general. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Compilers Programming languages Polyhedral model Program transformations Loop transformations Automatic parallelization Intermediate representation
4	Sub-Polyhedral Compilation using (Unit-)Two-Variables-Per-Inequality Polyhedra Upadrasta, Ramakrishna 13 March 2013 (has links) (PDF) The goal of this thesis is to design algorithms that run with better complexity when compiling or parallelizing loop programs. The framework within which our algorithms operate is the polyhedral model of compilation which has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations of the above framework remain one important weakness. We address it by introducing sub-polyhedral compilation by using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra, namely polyhedrawith restricted constraints of the type ax_{i}+bx_{j}\le c (\pm x_{i}\pm x_{j}\le c). A major focus of our sub-polyhedral compilation is the introduction of sub-polyhedral scheduling, where we propose a technique for scheduling using (U)TVPI polyhedra. As part of this, we introduce algorithms that can be used to construct under-aproximations of the systems of constraints resulting from affine scheduling problems. This technique relies on simple polynomial time algorithms to under approximate a general polyhedron into (U)TVPI polyhedra. The above under-approximation algorithms are generic enough that they can be used for many kinds of loop parallelization scheduling problems, reducing each of their complexities to asymptotically polynomial time. We also introduce sub-polyhedral code-generation where we propose algorithms to use the improved complexities of (U)TVPI sub-polyhedra in polyhedral code generation. In this problem, we show that the exponentialities associated with the widely used polyhedral code generators could be reduced to polynomial time using the improved complexities of (U)TVPI sub-polyhedra. The above presented sub-polyhedral scheduling techniques are evaluated in an experimental framework. For this, we modify the state-of-the-art PLuTo compiler which can parallelize for multi-core architectures using permutation and tiling transformations. We show that using our scheduling technique, the above under-approximations yield polyhedra that are non-empty for 10 out of 16 benchmarks from the Polybench (2.0) kernels. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Affine scheduling Approximation algorithms Compiler optimizations Compilers Loop transformations Optimization Parallelism Asymptotic complexity Code generation
5	Efficient search-based strategies for polyhedral compilation : algorithms and experience in a production compiler / Stratégies exploratoires efficaces pour la compilation polyédrique : algorithmes et expérience dans un compilateur de production Trifunovic, Konrad 04 July 2011 (has links) Une pression accrue s'exerce sur les compilateurs pour mettre en œuvre des transformations de programmes de plus en plus complexes délivrant le potentiel de performance des processeurs multicœurs et des accélérateurs hétérogènes. L'espace de recherche des optimisations de programmes possibles est gigantesque est manque de structure. La recherche de la meilleure transformation, qui inclut la prédiction des gains estimés de performance offerts par cette transformation, constitue le problème le plus difficiles pour les compilateurs optimisants modernes. Nous avons choisi de nous concentrer sur les transformations de boucles et sur leur automatisation, exprimées dans le modèle polyédrique. Les méthodes d'optimisation de programmes dans le modèle polyédrique se répartissent grossièrement en deux classes. La première repose sur l'optimisation linéaire d'une fonction de analytique de coût. La deuxième classe de méthodes met en œuvre une recherche itérative. La première approche est rapide, mais elle est facilement mise en défaut en ce qui concerne la découverte de la solution optimale. L'approche itérative est plus précise, mais le temps de compilation peut devenir prohibitif. Cette thèse contribue une approche nouvelle de la recherche itérative de transformations de programmes dans le modèle polyédrique. La nouvelle méthode proposée possède la précision et la capacité effective à extraire des transformations profitables des méthodes itératives, tout en en minimisant les faiblesses. Notre approche repose sur l'évaluation systématique d'une fonction de coût et de prédiction de performances non-linéaire. Par ailleurs, la parallélisation automatique dans le modèle polyédrique est actuellement dominée par des outils de compilation source-à-source. Nous avons choisi au contraire d'implémenter nos techniques dans la plateforme GCC, en opérant sur une représentation de code de bas niveau, à trois adresses. Nous montrons que le niveau d'abstraction de la représentation intermédiaire choisie engendre des difficultés de passage à l'échelle, et nous montrons comment les surmonter. À l'inverse, nous montrons qu'une représentation intermédiaire de bas niveau ouvre de nouveaux degrés de liberté, bénéficiant à notre stratégie itérative de recherche de transformations, et à la compilation polyédrique de manière générale. / In order to take the performance advantages of the current multicore and heterogeneous architectures the compilers are required to perform more and more complex program transformations. The search space of the possible program optimizations is huge and unstructured. Selecting the best transformation and predicting the potential performance benefits of that transformation is the major problem in today's optimizing compilers. The promising approach to handling the program optimizations is to focus on the automatic loop optimizations expressed in the polyhedral model. The current approaches for optimizing programs in the polyhedral model broadly fall into two classes. The first class of the methods is based on the linear optimization of the analytical cost function. The second class is based on the exhaustive iterative search. While the first approach is fast, it can easily miss the optimal solution. The iterative approach is more precise, but its running time might be prohibitively expensive. In this thesis we present a novel search-based approach to program transformations in the polyhedral model. The new method combines the benefits - effectiveness and precision - of the current approaches, while it tries to minimize their drawbacks. Our approach is based on enumerating the evaluations of the precise, nonlinear performance predicting cost-function. The current practice is to use the polyhedral model in the context of source-to-source compilers. We have implemented our techniques in a GCC framework that is based on the low level three address code representation. We show that the chosen level of abstraction for the intermediate representation poses scalability challenges, and we show the ways to overcome those problems. On the other hand, it is shown that the low level IR abstraction opens new degrees of freedom that are beneficial for the search-based transformation strategies and for the polyhedral compilation in general. Compilateurs Langages de programmation Modèle polyédrique Transformations de programmes Transformations de boucles La parallélisation automatique Représentation intermédiaire Compilers Programming languages Polyhedral model Program transformations Loop transformations Automatic parallelization Intermediate representation
6	Sub-Polyhedral Compilation using (Unit-)Two-Variables-Per-Inequality Polyhedra / Compilation sous-polyédrique reposant sur des systèmes à deux variables par inégalité Upadrasta, Ramakrishna 13 March 2013 (has links) Notre étude de la compilation sous-polyédrique est dominée par l’introduction de la notion l’ordonnancement affine sous-polyédrique, pour laquelle nous proposons une technique utilisant des sous-polyèdres (U)TVPI. Dans ce cadre, nous introduisons des algorithmes capables de construire des sous-approximations de systèmes de contraintes résultant de problèmes d’ordonnancement affine. Cette technique repose sur des algorithmes polynomiaux simples pour approcher un polyèdre quelconque par un polyèdre (U)TVPI. Nos algorithmes sont suffisamment génériques pour s’appliquer à de nombreux problèmes d’ordonnancement, de parallélisation, et d’optimisation de boucles, réduisant leur complexité temporelle à des fonctions polynomiales. Nous introduisons également une méthode pour la génération de code utilisant des algorithmes sous-polyédriques, tirant parti de la faible complexité des sous-polyèdres (U)TVPI. Dans ce cadre, nous montrons comment réduire la complexité associée aux générateurs de code les plus populaires, ramenant la complexité de plusieurs facteurs exponentiels à des fonctions polynomiales. Nombre de ces techniques sont évaluées expérimentalement. Pour cela, nous avons réalisé une version modifiée du compilateur PLuTo, capable de paralléliser et d’optimiser des nids de boucles pour des architectures multi-cœurs à l’aide de transformations affines, et notamment de partitionnement (tiling). Nous montrons qu’une majorité des noyaux de calcul de la suite Polybench (2.0) peut être manipulée à l’aide de notre technique d’ordonnancement, en préservant la faisabilité des polyèdres lors des sous-approximations. L’utilisation des systèmes approchés par des sous-polyèdres conduit à des gains asymptotiques en complexité, qui se traduit par des réductions significatives en temps de compilation, par rapport à un solveur de programmation linéaire de référence. Nous vérifions également que le code généré par notre prototype de parallélisation sous-polyédrique est compétitif par rapport à la performance du code généré par Pluto. / The goal of this thesis is to design algorithms that run with better complexity when compiling or parallelizing loop programs. The framework within which our algorithms operate is the polyhedral model of compilation which has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations of the above framework remain one important weakness. We address it by introducing sub-polyhedral compilation by using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra, namely polyhedrawith restricted constraints of the type ax_{i}+bx_{j}\le c (\pm x_{i}\pm x_{j}\le c). A major focus of our sub-polyhedral compilation is the introduction of sub-polyhedral scheduling, where we propose a technique for scheduling using (U)TVPI polyhedra. As part of this, we introduce algorithms that can be used to construct under-aproximations of the systems of constraints resulting from affine scheduling problems. This technique relies on simple polynomial time algorithms to under approximate a general polyhedron into (U)TVPI polyhedra. The above under-approximation algorithms are generic enough that they can be used for many kinds of loop parallelization scheduling problems, reducing each of their complexities to asymptotically polynomial time. We also introduce sub-polyhedral code-generation where we propose algorithms to use the improved complexities of (U)TVPI sub-polyhedra in polyhedral code generation. In this problem, we show that the exponentialities associated with the widely used polyhedral code generators could be reduced to polynomial time using the improved complexities of (U)TVPI sub-polyhedra. The above presented sub-polyhedral scheduling techniques are evaluated in an experimental framework. For this, we modify the state-of-the-art PLuTo compiler which can parallelize for multi-core architectures using permutation and tiling transformations. We show that using our scheduling technique, the above under-approximations yield polyhedra that are non-empty for 10 out of 16 benchmarks from the Polybench (2.0) kernels. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility. Ordonnancement affine Algorithmes d'approximation Optimisations du compilateur Compilateurs Transformations de boucles Optimisation Parallélisme Complexité asymptotique Génération de code Affine scheduling Approximation algorithms Compiler optimizations Compilers Loop transformations Optimization Parallelism Asymptotic complexity Code generation
7	An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations Pananilath, Irshad Muhammed January 2014 (has links) (PDF) Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark. Lattice-Boltzmann Computations Computational Fluid Dynamics Tiling Stencil Computations Single Instruction Multiple Data (SIMD) Parallel Computers Parallel Processing Loop Transformations Lattice-Boltzman Method (LBM) Lattice Boltzman Method Lattice-Boltzmann Equation Computer Science

1

Page generated in 0.0966 seconds