Global ETD Search

1	Analyse et transformation de programmes: du modèle polyédrique aux langages formels Cohen, Albert 21 December 1999 (has links) (PDF) Les microprocesseurs et les architectures parallèles d'aujourd'hui lancent de nouveaux défis aux techniques de compilation. En présence de parallélisme, les optimisations deviennent trop spécifiques et complexes pour être laissées au soin du programmeur. Les techniques de parallélisation automatique dépassent le cadre traditionnel des applications numériques et abordent de nouveaux modèles de programmes, tels que les nids de boucles non affines, les appels récursifs et les structures de données dynamiques. Des analyses précises sont au c{\oe}ur de la détection du parallélisme, elles rassemblent des informations à la compilation sur les propriétés des programmes à l'exécution. Ces informations valident des transformations utiles pour l'extraction du parallélisme et la génération de code parallèle. Cette thèse aborde principalement des analyses et des transformations avec une vision par instances, c'est-à-dire considérant les propriétés individuelles de chaque instance d'une instruction à l'exécution. Une nouvelle formalisation à l'aide de langages formels nous permet tout d'abord d'étudier une analyse de dépendances et de définitions visibles par instances pour programmes récursifs. L'application de cette analyse à l'expansion et la parallélisation de programmes récursifs dévoile des résultats encourageants. Les nids de boucles quelconques font l'objet de la deuxième partie de ce travail. Une nouvelle étude des techniques de parallélisation fondées sur l'expansion nous permet de proposer des solutions à des problèmes d'optimisation cruciaux. automatic parallelization recursive programs non-affine loop nests dependence analysis reaching definition analysis memory expansion
2	Automatic Storage Optimization of Arrays Affine Loop Nests Bhaskaracharya, Somashekaracharya G January 2016 (has links) (PDF) Efficient memory usage is crucial for data-intensive applications as a smaller memory footprint ensures better cache performance and allows one to run a larger problem size given a axed amount of main memory. The solutions found by existing techniques for automatic storage optimization for arrays in a new loop-nests, which minimize the storage requirements for the arrays, are often far from good or optimal and could even miss nearly all storage optimization potential. In this work, we present a new automatic storage optimization framework and techniques that can be used to achieve intra-array as well as inter-array storage reuse within a new loop-nests with a pre-determined schedule. Over the last two decades, several heuristics have been developed for achieving complex transformations of a new loop-nests using the polyhedral model. However, there are no comparably strong heuristics for tackling the problem of automatic memory footprint optimization. We tackle the problem of storage optimization for arrays by formulating it as one of ending the right storage partitioning hyperplanes: each storage partition corresponds to a single storage location. Statement-wise storage partitioning hyperplanes are determined that partition a unit end global array space so that values with overlapping live ranges are not mapped to the same partition. Our integrated heuristic for exploiting intra-array as well as inter-array reuse opportunities is driven by a fourfold objective function that not only minimizes the dimensionality and storage requirements of arrays required for each high-level statement, but also maximizes inter-statement storage reuse. We built an automatic polyhedral storage optimizer called SMO using our storage partitioning approach. Storage reduction factors and other results we report from SMO demon-strate the e activeness of our approach on several benchmarks drawn from the domains of image processing, stencil computations, high-performance computing, and the class of tiled codes in general. The reductions in storage requirement over previous approaches range from a constant factor to asymptotic in the loop blocking factor or array extents { the latter being a dramatic improvement for practical purposes. As an incidental and related topic, we also studied the problem of polyhedral compilation of graphical data programs. While polyhedral techniques for program transformation are now used in several proprietary and open source compilers, most of the research on poly-herald compilation has focused on imperative languages such as C, where the computation is species in terms of statements with zero or more nested loops and other control structures around them. Graphical data ow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and ref-eventual transparency of data ow languages impose a di errant set of challenges. In this work, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from data ow programs and to synthesize them from their equivalent polyhedral representation. We then describe Polyglot, a framework for automatic transformation of data ow programs that we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used data ow programming languages. Results show that data ow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30 while running on an 8-core Intel system. Automatic Storage Optimization Dataflow Programs Affine Loop Nests Lattice-Boltzmann Method Diamond Tiling Blur Filter - Interleaved Schedule Blur filter - Tiled Execution Affine Hyperplane Polyhedral Model Polyhedral Storage Optimizer Polyhedral Compilation PolyGLoT Polyhedral Loop Transformation Graphical Dataflow Language Intra-Array Sorage Optimization Arrays Storage Optimization Computer Science
3	Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines Ramashekar, Thejas 10 1900 (has links) (PDF) Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU. A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired. One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature. In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling. High Performance Computing Computer Memory Management Multi-GPU Memory Manager Automatic Data Allocation Data Transfer Buffer Management Affine Loop Nests Bounding Box Based Memory Manager GPU Architecture Data Movement Code Box Based Memory Manager (BBMM) Computer Science

1

Page generated in 0.0522 seconds