Spelling suggestions: "subject:"cache optimization"" "subject:"vache optimization""
1 |
Combinatorial Problems in Compiler OptimizationBeg, Mirza Omer 08 April 2013 (has links)
Several important compiler optimizations such as instruction scheduling
and register allocation are fundamentally hard and are usually solved using heuristics
or approximate solutions.
In contrast, this thesis examines optimal solutions to three combinatorial problems in compiler optimization.
The first problem addresses instruction scheduling for clustered
architectures, popular in embedded systems. Given a set of
instructions the optimal solution gives the best possible schedule for a given clustered
architectural model. The problem is solved
using a decomposition technique applied to constraint programming which determines the spatial and
temporal schedule using an integrated approach. The experiments
show that our solver can tradeoff some compile time efficiency to solve most instances in
standard benchmarks giving significant performance improvements.
The second problem addresses
instruction selection in the compiler code generation phase.
Given the intermediate representation of code the optimal solution
determines the sequence of equivalent machine instructions as it optimizes for code size.
This thesis shows that a large number of benchmark instances can be solved optimally
using constraint programming techniques.
The third problem addressed is the placement of data in memory for efficient
cache utilization.
Using the data access patterns of a given program, our algorithm
determines a placement to reorganize data in
memory which would result in fewer cache misses.
By focusing on graph theoretic placement techniques it is
shown that there exist, in special cases, efficient and optimal algorithms for
data placement that significantly
improve cache utilization. We also propose heuristic solutions for solving larger instances
for which provably optimal solutions cannot be determined using polynomial time algorithms.
We demonstrate that cache hit rates can
be significantly improved by using profiling techniques over a wide range of benchmarks and cache configurations.
|
2 |
Combinatorial Problems in Compiler OptimizationBeg, Mirza Omer 08 April 2013 (has links)
Several important compiler optimizations such as instruction scheduling
and register allocation are fundamentally hard and are usually solved using heuristics
or approximate solutions.
In contrast, this thesis examines optimal solutions to three combinatorial problems in compiler optimization.
The first problem addresses instruction scheduling for clustered
architectures, popular in embedded systems. Given a set of
instructions the optimal solution gives the best possible schedule for a given clustered
architectural model. The problem is solved
using a decomposition technique applied to constraint programming which determines the spatial and
temporal schedule using an integrated approach. The experiments
show that our solver can tradeoff some compile time efficiency to solve most instances in
standard benchmarks giving significant performance improvements.
The second problem addresses
instruction selection in the compiler code generation phase.
Given the intermediate representation of code the optimal solution
determines the sequence of equivalent machine instructions as it optimizes for code size.
This thesis shows that a large number of benchmark instances can be solved optimally
using constraint programming techniques.
The third problem addressed is the placement of data in memory for efficient
cache utilization.
Using the data access patterns of a given program, our algorithm
determines a placement to reorganize data in
memory which would result in fewer cache misses.
By focusing on graph theoretic placement techniques it is
shown that there exist, in special cases, efficient and optimal algorithms for
data placement that significantly
improve cache utilization. We also propose heuristic solutions for solving larger instances
for which provably optimal solutions cannot be determined using polynomial time algorithms.
We demonstrate that cache hit rates can
be significantly improved by using profiling techniques over a wide range of benchmarks and cache configurations.
|
3 |
CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOSTPalki, Anand B. 01 January 2006 (has links)
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to optimize the cache performance of the code by splitting up the grid into smaller blocks and carrying out the required calculations on these smaller blocks. This in turn leads to enhanced code performance on commodity clusters. Accordingly, this work presents a discussion along with a detailed description of two techniques: external and internal blocking, for data access optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The modified version of the code was also ported to the current state-of-art architectures with successful results.
|
4 |
PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURESKristipati, Pavan K. 01 January 2008 (has links)
This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study.
|
Page generated in 0.0871 seconds