Global ETD Search

1	Dynamic Register Allocation for Network Processors Collins, Ryan 22 May 2006 (has links) Network processors are custom high performance embedded processors deployed for a variety of tasks that must operate at high line (Gbits/sec) speeds to prevent packet loss. With the increase in complexity of application domains and larger code store on modern network processors, the network processor programming goes beyond simply exploiting parallelism in packet processing. Unlike the traditional homogeneous threading model, modern network processor programming must support heterogenous threads that execute simultaneously on a microengine. In order to support such demands, we first propose hardware management of registers across multiple threads. In their PLDI 2004 paper, Zhuang and Pande for the first time proposed a compiler based scheme to support register allocation across threads; in this work, we extend their static allocation method to support aggressive register allocation taking dynamic context into account. We also remove the load/stores due to aliased memory accesses converting them into register moves exploiting dead registers. This results in tremendous savings in latency and higher throughput mainly due to the removal of high latency accesses as well as idle cycles. The dynamic register allocator is designed to be light-weight and low latency by undertaking many tradeoffs. In the second part of this work, our goal is to design an automatic register allocation scheme that makes compiler transperant to dual bank register file design for network processors. By design network processors mandate that the operands of an instruction must be allocated to registers belonging to two different banks. The key goal in this work is to take into account dynamic contexts to balance the register pressure across the banks. Key decisions made involve, how and where to map incoming virtual register on a physical register in the bank, how to evict dead ones, and how to minimally undertake bank to bank copies and swaps. It is shown that it is viable to solve both of these problems by simple hardware designs that avail of dynamic contexts. The performance gains are substantial and due to simplicity of the designs (which are also off critical paths) such schemes may be attractive in practice. Compilers Register allocation Network processors
2	Register allocation and spilling using the expected distance heuristic Burroughs, Ivan Neil 12 April 2016 (has links) The primary goal of the register allocation phase in a compiler is to minimize register spills to memory. Spills, in the form of store and load instructions, affect execution time as the processor must wait for the slower memory system to respond. Deciding which registers to spill can benefit from execution frequency information yet when this information is available it is not fully utilized by modern register allocators. We present a register allocator that fully exploits profiling information to mini- mize the runtime costs of spill instructions. We use the Furthest Next Use heuristic, informed by branch probability information to decide which virtual register to spill when required. We extend this heuristic, which under the right conditions can lead to the minimum number of spills, to the control flow graph by computing Expected Distance to next use. The furthest next use heuristic, when applied to the control flow graph, only par- tially determines the best placement of spill instructions. We present an algorithm for optimizing spill instruction placement in the graph that uses block frequency infor- mation to minimize execution costs. Our algorithm quickly finds the best placements for spill instructions using a novel method for solving placement problems. We evaluate our allocator using both static and dynamic profiling information for the SPEC CINT2000 benchmark and compare it to the LLVM allocator. Targeting the ARMv7 architecture, we find average reductions in numbers of store and load instructions of 36% and 50%, respectively, using static profiling and 52% and 52% using dynamic profiling. We have also seen an overall improvement in benchmark speed. / Graduate Compilers Register Allocation Spilling Expected Distance
3	A Parallelizing Compiler Based on Partial Evaluation Surati, Rajeev 01 July 1993 (has links) We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer. VLIW partial evaluation register allocation parallelsscheduling parallelizing compilers
4	Constraint Programming Techniques for Optimal Instruction Scheduling Malik, Abid 03 1900 (has links) Modern processors have multiple pipelined functional units and can issue more than one instruction per clock cycle. This puts great pressure on the instruction scheduling phase in a compiler to expose maximum instruction level parallelism. Basic blocks and superblocks are commonly used regions of code in a program for instruction scheduling. Instruction scheduling coupled with register allocation is also a well studied problem to produce better machine code. Scheduling basic blocks and superblocks optimally with or with out register allocation is NP-complete, and is done sub-optimally in production compilers using heuristic approaches. In this thesis, I present a constraint programming approach to the superblock and basic block instruction scheduling problems for both idealized and realistic architectures. Basic block scheduling with register allocation with no spilling allowed is also considered. My models for both basic block and superblock scheduling are optimal and fast enough to be incorporated into production compilers. I experimentally evaluated my optimal schedulers on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal schedulers were very robust and scaled to the largest basic blocks and superblocks. Depending on the architectural model, between 99.991\% to 99.999\% of all basic blocks and superblocks were solved to optimality. The schedulers were able to routinely solve the largest blocks, including blocks with up to 2600 instructions. My results compare favorably to the best previous optimal approaches, which are based on integer programming and enumeration. My approach for basic block scheduling without allowing spilling was good enough to solve 97.496\% of all basic blocks in the SPEC 2000 benchmark. The approach was able to solve basic blocks as large as 50 instructions for both idealized and realistic architectures within reasonable time limits. Again, my results compare favorably to recent work on optimal integrated code generation, which is based on integer programming. Constraint Programming Compiler Optimization Instruction Scheduling Register Allocation Computer Science
5	Constraint Programming Techniques for Optimal Instruction Scheduling Malik, Abid 03 1900 (has links) Modern processors have multiple pipelined functional units and can issue more than one instruction per clock cycle. This puts great pressure on the instruction scheduling phase in a compiler to expose maximum instruction level parallelism. Basic blocks and superblocks are commonly used regions of code in a program for instruction scheduling. Instruction scheduling coupled with register allocation is also a well studied problem to produce better machine code. Scheduling basic blocks and superblocks optimally with or with out register allocation is NP-complete, and is done sub-optimally in production compilers using heuristic approaches. In this thesis, I present a constraint programming approach to the superblock and basic block instruction scheduling problems for both idealized and realistic architectures. Basic block scheduling with register allocation with no spilling allowed is also considered. My models for both basic block and superblock scheduling are optimal and fast enough to be incorporated into production compilers. I experimentally evaluated my optimal schedulers on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal schedulers were very robust and scaled to the largest basic blocks and superblocks. Depending on the architectural model, between 99.991\% to 99.999\% of all basic blocks and superblocks were solved to optimality. The schedulers were able to routinely solve the largest blocks, including blocks with up to 2600 instructions. My results compare favorably to the best previous optimal approaches, which are based on integer programming and enumeration. My approach for basic block scheduling without allowing spilling was good enough to solve 97.496\% of all basic blocks in the SPEC 2000 benchmark. The approach was able to solve basic blocks as large as 50 instructions for both idealized and realistic architectures within reasonable time limits. Again, my results compare favorably to recent work on optimal integrated code generation, which is based on integer programming. Constraint Programming Compiler Optimization Instruction Scheduling Register Allocation Computer Science
6	A Just in Time Register Allocation and Code Optimization Framework for Embedded Systems Thammanur, Sathyanarayan 11 October 2001 (has links) No description available. compiler optimizations dynamic compilation code generation register allocation
7	Characterization and optimization of JavaScript programs for mobile systems Srikanth, Aditya 09 October 2013 (has links) JavaScript has permeated into every aspect of the web experience in today's world, making it highly crucial to process it as quickly as possible. With the proliferation of HTML5 and its associated mobile web applications, the world is slowly but surely moving into an age where majority of the webpages will involve complex computations and manipulations within the JavaScript engine. Recent techniques like Just-in-Time (JIT) compilation have become commonplace in popular browsers like Chrome and Firefox, and there is an ongoing effort to further optimize them in the context of mobile systems. In order to fully take advantage of JavaScript-heavy webpages, it is important to first characterize the interaction of these webpages (both existing pages and modern HTML5 pages) with the different components of the JavaScript engine, viz. the interpreter, the method JIT, the optimizing compiler and the garbage collector. In this thesis, the aforementioned characterization work was leveraged to identify the limits of JavaScript optimizations. Subsequently, a particular optimization, i.e. Register Allocation heuristics was explored in detail on different types of JavaScript programs. This was primarily because the majority of the time (an average of 52.81%) spent in the optimizing compiler is for the register allocation stage alone. By varying the heuristics for register assignment, interval priority and spill selection, a clear idea is obtained about how it impacts certain types of programs more than others. This thesis also gives a preliminary insight into JavaScript applications and benchmarks, showing that these applications tend to be register-intensive, with large live intervals and sparse uses, and sensitive to array and string manipulations. A statically-selected optimal register allocation scheme outperforms the default register allocation scheme resulting in 9.1% performance improvement and 11.23% reduction in execution time on a representative mobile system. / text JavaScript Mobile systems Workload characterization Compiler optimizations Register allocation Firefox SpiderMonkey
8	GENETIC ALGORITHM CONTROLLED COMMON SUBEXPRESSION ELIMINATION FOR SPILL-FREE REGISTER ALLOCATION Arcot, Shashi Deepa 01 January 2010 (has links) As code complexity increases, maxlive increases. This is especially true in the case of the Kentucky If-Then-Else architecture proposed for Nanocontrollers. To achieve low circuit complexity, computations are decomposed to bit-level operations, thus generating large blocks of code with complex dependence structures. Additionally, the Nanocontroller architecture allows for only a small number of single bit registers and no extra memory. The assumption of an infinite number of registers made during code generation becomes a huge problem during register allocation because the small number of registers and no additional memory. The large basic blocks mean that maxlive almost always exceeds the number of registers and the traditional methods of register allocation such as instruction re-ordering and register spill/reload cannot be applied trivially. This thesis deals with finding a solution to reduce maxlive for successful register allocation using Genetic Algorithms. Electrical and Computer Engineering
9	Compiler optimisation of typeless languages Fourniotis Pavlatos, Panayis January 1998 (has links) We have written an optimising compiler for a typeless, imperative, modular programming language. The optimiser, which works on a 3-address intermediate representation generated from the source program, uses some novel techniques described in this thesis. The techniques are universally applicable, although some are particularly useful in typeless compilation. We present a new register allocation and assignment scheme. Unlike traditional "colouring" allocators, our method separates the problem into distinct allocation and assignment phases. The former is achieved by using an iterative process to extend a local (within basic blocks) allocation method to the global (across basic blocks) domain. This obviates the need for a sophisticated assignment algorithm; we show how to use simple heuristics to assign registers after allocation. We also present a simple method for identifying loops in a program's intermediate representation and assigning loop nesting levels. Unlike traditional methods, this does not rely on the concept of flowgraph dominators, and is able to deal sensibly with irreducible flowgraphs and "unstructured" loops that interlock or partially overlap. The major part of the thesis concerns value range analysis. Based on the theoretical framework of abstract interpretation, we describe an analysis of the intermediate code that predicts safe approximations to the run-time value ranges of variables and memory used by the program being compiled. To be useful in compiling a typeless language, this analysis must be able to handle values of different kinds (integers, pointers, function addresses, etc.) We show how we can subsume some traditional optimisation techniques, such as constant propagation, into more powerful methods that take advantage of value range information to optimise a wider variety of cases. We also show how this information can be used to recover most of the benefits of types, without sacrificing the flexibility of typelessness. Besides the above, value range analysis allows a number of optimisations that were heretofore impossible. Many of these are improvements to register allocation; we investigate better treatments for variables that can be accessed by address. We also describe a method of removing memory accesses by allowing variables that are simultaneously live to share registers, and suggest a similar scheme for values stored in memory. Finally, we show how the results of value range analysis can be shared across different program modules and different compiler runs. The method used is powerful enough to be useful, but simple enough to integrate with old code that cannot be recompiled. Inter-modular optimisation can be transparent to the user, improving the results of value range analysis within a module without altering its functionality; or it can be visible, optimising modules with respect to each other. 005.1
10	Verification formelle et optimisation de l’allocation de registres / Formal Verification and Optimization of Register Allocation Robillard, Benoît 30 November 2010 (has links) La prise de conscience générale de l'importance de vérifier plus scrupuleusement les programmes a engendré une croissance considérable des efforts de vérification formelle de programme durant cette dernière décennie. Néanmoins, le code qu'exécute l'ordinateur, ou code exécutable, n'est pas le code écrit par le développeur, ou code source. La vérification formelle de compilateurs est donc un complément indispensable à la vérification de code source.L'une des tâches les plus complexes de compilation est l'allocation de registres. C'est lors de celle-ci que le compilateur décide de la façon dont les variables du programme sont stockées en mémoire durant son exécution. La mémoire comporte deux types de conteneurs : les registres, zones d'accès rapide, présents en nombre limité, et la pile, de capacité supposée suffisamment importante pour héberger toutes les variables d'un programme, mais à laquelle l'accès est bien plus lent. Le but de l'allocation de registres est de tirer au mieux parti de la rapidité des registres, car une allocation de registres de bonne qualité peut conduire à une amélioration significative du temps d'exécution du programme.Le modèle le plus connu de l'allocation de registres repose sur la coloration de graphe d'interférence-affinité. Dans cette thèse, l'objectif est double : d'une part vérifier formellement des algorithmes connus d'allocation de registres par coloration de graphe, et d'autre part définir de nouveaux algorithmes optimisants pour cette étape de compilation. Nous montrons tout d'abord que l'assistant à la preuve Coq est adéquat à la formalisation d'algorithmes d'allocation de registres par coloration de graphes. Nous procédons ainsi à la vérification formelle en Coq d'un des algorithmes les plus classiques d'allocation de registres par coloration de graphes, l'Iterated Register Coalescing (IRC), et d'une généralisation de celui-ci permettant à un utilisateur peu familier du système Coq d'implanter facilement sa propre variante de cet algorithme au seul prix d'une éventuelle perte d'efficacité algorithmique. Ces formalisations nécessitent des réflexions autour de la formalisation des graphes d'interférence-affinité, de la traduction sous forme purement fonctionnelle d'algorithmes impératifs et de l'efficacité algorithmique, la terminaison et la correction de cette version fonctionnelle. Notre implantation formellement vérifiée de l'IRC a été intégrée à un prototype du compilateur CompCert.Nous avons ensuite étudié deux représentations intermédiaires de programmes, dont la forme SSA, et exploité leurs propriétés pour proposer de nouvelles approches de résolution optimale de la fusion, l'une des optimisations opéréeslors de l'allocation de registres dont l'impact est le plus fort sur la qualité du code compilé. Ces approches montrent que des critères de fusion tenant compte de paramètres globaux du graphe d'interférence-affinité, tels que sa largeur d'arbre, ouvrent la voie vers de nouvelles méthodes de résolution potentiellement plus performantes. / The need for trustful programs led to an increasing use of formal verication techniques the last decade, and especially of program proof. However, the code running on the computer is not the source code, i.e. the one written by the developper, since it has to betranslated by the compiler. As a result, the formal verication of compilers is required to complete the source code verication. One of the hardest phases of compilation is register allocation. Register allocation is the phase within which the compiler decides where the variables of the program are stored in the memory during its execution. The are two kinds of memory locations : a limited number of fast-access zones, called registers, and a very large but slow-access stack. The aim of register allocation is then to make a great use of registers, leading to a faster runnable code.The most used model for register allocation is the interference graph coloring one. In this thesis, our objective is twofold : first, formally verifying some well-known interference graph coloring algorithms for register allocation and, second, designing new graph-coloring register allocation algorithms. More precisely, we provide a fully formally veri ed implementation of the Iterated Register Coalescing, a very classical graph-coloring register allocation heuristics, that has been integrated into the CompCert compiler. We also studied two intermediate representations of programs used in compilers, and in particular the SSA form to design new algorithms, using global properties of the graph rather than local criteria currently used in the litterature. Allocation de registres Vérification formelle Coloration de graphe Register Allocation Graph coloring algorithms Formal verication of compilers 004

Search results