• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Low-Level Haskell Code: Measurements and Optimization Techniques

Peixotto, David 06 September 2012 (has links)
Haskell is a lazy functional language with a strong static type system and excellent support for parallel programming. The language features of Haskell make it easier to write correct and maintainable programs, but execution speed often suffers from the high levels of abstraction. While much past research focuses on high-level optimizations that take advantage of the functional properties of Haskell, relatively little attention has been paid to the optimization opportunities in the low-level imperative code generated during translation to machine code. One problem with current low-level optimizations is that their effectiveness is limited by the obscured control flow caused by Haskell's high-level abstractions. My thesis is that trace-based optimization techniques can be used to improve the effectiveness of low-level optimizations for Haskell programs. I claim three unique contributions in this work. The first contribution is to expose some properties of low-level Haskell codes by looking at the mix of operations performed by the selected benchmark codes and comparing them to the low-level codes coming from traditional programming languages. The low-level measurements reveal that the control flow is obscured by indirect jumps caused by the implementation of lazy evaluation, higher-order functions, and the separately managed stacks used by Haskell programs. My second contribution is a study on the effectiveness of a dynamic binary trace-based optimizer running on Haskell programs. My results show that while viable program traces frequently occur in Haskell programs the overhead associated with maintaing the traces in a dynamic optimization system outweigh the benefits we get from running the traces. To reduce the runtime overheads, I explore a way to find traces in a separate profiling step. My final contribution is to build and evaluate a static trace-based optimizer for Haskell programs. The static optimizer uses profiling data to find traces in a Haskell program and then restructures the code around the traces to increase the scope available to the low-level optimizer. My results show that we can successfully build traces in Haskell programs, and the optimized code yields a speedup over existing low-level optimizers of up to 86% with an average speedup of 5% across 32 benchmarks.
2

An Adaptive Recompilation Framework For Rotor And Architectural Support For Online Program Instrumentation

Vaswani, Kapil 08 1900 (has links)
Microsoft Research / Although runtime systems and the dynamic compilation model have revolutionized the process of application development and deployment, the associated performance overheads continue to be a cause for concern and much research. In the first part of this thesis, we describe the design and implementation of an adaptive recompilation framework for Rotor, a shared source implementation of the Common Language Infrastructure (CLI) that can increase program performance through intelligent recompilation decisions and optimizations based on the program's past behavior. Our extensions to Rotor include a low overhead runtime-stack based sampling profiler that identifies program hotspots. A recompilation controller oversees the recompilation process and generates recompilation requests. At the first-level of a multi-level optimizing compiler, code in the intermediate language is converted to an internal intermediate representation and optimized using a set of simple transformations. The compiler uses a fast yet effective linear scan algorithm for register allocation. Hot methods can be instrumented in order to collect basic-block, edge and call-graph profile information. Profile-guided optimizations driven by online profile information are used to further optimize heavily executed methods at the second level of recompilation. An evaluation of the framework using a set of test programs shows that performance can improve by a maximum of 42.3% and by 9% on average. Our results also show that the overheads of collecting accurate profile information through instrumentation to an extent outweigh the benefits of profile-guided optimizations in our implementation, suggesting the need for implementing techniques that can reduce such overheads. A flexible and extensible framework design implies that additional profiling and optimization techniques can be easily incorporated to further improve performance. As previously stated, fine-grained and accurate profile information must be available at low cost for advanced profile-guided optimizations to be effective in online environments. In this second part of this thesis, we propose a generic framework that makes it possible for instrumentation based profilers to collect profile data efficiently, a task that has traditionally been associated with high overheads. The essence of the scheme is to make the underlying hardware aware of instrumentation using a special set of profile instructions and tuned microarchitecture. This not only allows the hardware to provide the runtime with mechanisms to control the profiling activity, but also makes it possible for the hardware itself to optimize the process of profiling in a manner transparent to the runtime. We propose selective instruction dispatch as one possible controlling mechanism that can be used by the runtime to manage the execution of profile instructions and keep profiling overheads under check. We propose profile flag prediction, a hardware optimization that complements the selective dispatch mechanism by not fetching profile instructions when the runtime has turned profiling off. The framework is light-weight and flexible. It eliminates the need for expensive book-keeping, recompilation or code duplication. Our simulations with benchmarks from the SPEC CPU2000 suite show that overheads for call-graph and basic block profiling can be reduced by 72.7% and 52.4% respectively with a negligible loss in accuracy.

Page generated in 0.1123 seconds