11 |
Unconventional Applications of Compiler AnalysisSelby, Jason W. A. January 2011 (has links)
Previously, compiler transformations have primarily focused on
minimizing program execution time. This thesis explores some examples
of applying compiler technology outside of its original scope.
Specifically, we apply compiler analysis to the field of software
maintenance and evolution by examining the use of global data
throughout the lifetimes of many open source projects. Also, we
investigate the effects of compiler optimizations on the power
consumption of small battery powered devices. Finally, in an area
closer to traditional compiler research we examine automatic program
parallelization in the form of thread-level speculation.
|
12 |
Übersetzermethoden zur automatischen Hardware-Synthese.Pfahler, Peter. Unknown Date (has links)
Universiẗat, Diss., 1988--Paderborn.
|
13 |
Code generation and adaptive control divergence management for light weight SIMT processorsGupta, Meghana 27 May 2016 (has links)
The energy costs of data movement are limiting the performance scaling of future generations of high performance computing architectures targeted to data intensive applications. The result has been a resurgence in the interest in processing-in-memory (PIM) architectures. This challenge has spawned the development of a scalable, parametric data parallel architecture referred at the Heterogeneous Architecture Research Prototype (HARP) - a single instruction multiple thread (SIMT) architecture for integration into DRAM systems, particularly 3D memory stacks as a distinct processing layer to exploit the enormous internal memory bandwidth. However, this potential can only be realized with an optimizing compilation environment. This thesis addresses this challenge by i) the construction of an open source compiler for HARP, and ii) integrating optimizations for handling control flow divergence for HARP instances. The HARP compiler is built using the LLVM open source compiler infrastructure. Apart from traditional code generation, the HARP compiler backend handles unique challenges associated with the HARP instruction set. Chief among them is code generation for control divergence management techniques. The HARP architecture and compiler supports i) a hardware reconvergence stack and ii) predication to handle divergent branches. The HARP compiler addresses several challenges associated with generating code for these two control divergence management techniques and implements multiple analyses and transformations for code generation. Both of these techniques have unique advantages and disadvantages depending upon whether the conditional branch is likely to be unanimous or not. Two decision frameworks, guided by static analysis and dynamic profile information are implemented to choose between the control divergence management techniques by analyzing the nature of the conditional branches and utilizing this information during compilation.
|
14 |
Implementation of a highly portable Pascal interpreter using indirect threaded code techniquesHelliwell, A. M. January 1987 (has links)
No description available.
|
15 |
The syntactic evolution of programming languagesBagiokou, Maria January 1999 (has links)
No description available.
|
16 |
Compiler-driven data layout transformations for network applicationsFenacci, Damon January 2012 (has links)
This work approaches the little studied topic of compiler optimisations directed to network applications. It starts by investigating if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compiler optimisations. It shows an automated approach that is capable of identifying domain-specific workload characterisations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarise key resource utilisation issues and enable compiler engineers to address the highlighted bottlenecks. By applying this methodology to data intensive network infrastructure application it shows that data organisation is the key obstacle to overcome in order to achieve high performance. It therefore proposes and evaluates three specialised data transformations (structure splitting, array regrouping, and software caching) against the industrial EEMBC networking benchmarks and real-world data sets. It also demonstrates on one hand that speedups of up to 2.62 can be achieved, but on the other that no single solution performs equally well across different network traffic scenarios. Hence, to address this issue, an adaptive software caching scheme for high frequency route lookup operations is introduced and its effectiveness evaluated one more time against EEMBC networking benchmarks and real-world data sets achieving speedups of up to 3.30 and 2.27. The results clearly demonstrate that adaptive data organisation schemes are necessary to ensure optimal performance under varying network loads. Finally this research addresses another issue introduced by data transformations such as array regrouping and software caching, i.e. the need for static analysis to allow efficient resource allocation. This thesis proposes a static code analyser that allows the automatic resource analysis of source code containing lists and tree structures. The tool applies a combination of amortised analysis and separation logic methodology to real code and is able to evaluate type and resource usage of existing data structures, which can be used to compute global resource consumption values for full data intensive network applications.
|
17 |
Performance optimizations for compiler-based error detectionMitropoulou, Konstantina January 2015 (has links)
The trend towards smaller transistor technologies and lower operating voltages stresses the hardware and makes transistors more susceptible to transient errors. In future systems, performance and power gains will come at the cost of unreliable areas on the chip. For this reason, there is an increased need for low-overhead highly-reliable error detection methodologies. In the last years, several techniques have been proposed. The majority of them are based on redundancy which can be implemented at several levels (e.g., hardware, instruction, thread, process, etc). In instruction-level error detection approaches, the compiler replicates the instructions of the program and inserts checks wherever they are needed. The checks evaluate code correctness and decide whether or not an error has occurred. This type of error detection is more flexible than the hardware alternatives. It allows the programmer to choose the protected area of the program and it can be applied without any hardware modifications. On the other hand, the replicated instructions and the checks cause a large slowdown making software techniques less appealing. In this thesis, we propose two techniques that aim at reducing the error detection overhead of compiler-based approaches and improving system’s performance without sacrificing the fault-coverage. The first technique, DRIFT, achieves this by decoupling the execution of the code (original and replicated) from the checks. The checks are compare and jump instructions. The latter ones tend to make the code sequential and prohibit the compiler from performing aggressive instruction scheduling optimizations. We call this phenomenon basic-block fragmentation. DRIFT reduces the impact of basic-block fragmentation by breaking the synchronized execute-check-confirm-execute cycle. In this way, DRIFT generates a scheduler-friendly code with more instruction-level parallelism (ILP). As a result, it reduces the performance overhead down to 1.29× (on average) and outperforms the state-of-the-art by up to 29.7% retaining the same fault-coverage. Next, CASTED focuses on reducing the impact of error detection overhead on single-chip scalable architectures that are composed of tightly-coupled cores. The proposed compiler methodology adaptively distributes the error detection overhead to the available resources across multiple cores, fully exploiting the abundant ILP of these architectures. CASTED adapts to a wide range of architecture configurations (issue-width, inter-core communication). The results show that CASTED matches the performance of, and often outperforms, sometimes by as mush as 21.2%, the best fixed state-of-the-art approach while maintaining the same fault coverage.
|
18 |
Jit4OpenCL: a compiler from Python to OpenCLXunhao, Li 11 1900 (has links)
Heterogeneous computing platforms that use GPUs and CPUs in tandem for computation have become an important choice to build low-cost high-performance computing platforms. The computing ability of modern GPUs surpasses that of CPUs can offer for certain classes of applications. GPUs can deliver several Tera-Flops in peak performance. However, programmers must adopt a more complicated and more difficult new programming paradigm.
To alleviate the burden of programming for heterogeneous systems, Garg and Amaral developed a Python compiling framework that combines an ahead-of-time compiler called unPython with a just-in-time compiler called jit4GPU. This compilation framework generates code for systems with AMD GPUs. We extend the framework to retarget it to generate OpenCL code, an industry standard that is implemented for most GPUs. Therefore, by generating OpenCL code, this new compiler, called jit4OpenCL, enables the execution of the same program in a wider selection of heterogeneous platforms. To further improve the target-code performance on nVidia GPUs, we developed an array-access analysis tool that helps to exploit the data reusability by utilizing the shared (local) memory space hierarchy in OpenCL.
The thesis presents an experimental performance evaluation indicating that, in comparison with jit4GPU, jit4OpenCL has performance degradation because of the current performance of implementations of OpenCL, and also because of the extra time needed for the additional just-in-time compilation. However, the portable code generated by jit4OpenCL still have performance gains in some applications compared to highly optimized CPU code.
|
19 |
Interprocedural Static Single Assignment FormCalman, Silvian 09 June 2011 (has links)
Static Single Assignment (SSA) is an Intermediate Representation (IR) that simplifies the design and implementation of analyses and optimizations. While intraprocedural SSA is ubiquitous in modern compilers, the use of interprocedural SSA (ISSA), although seemingly a natural extension, is limited. In this dissertation, we propose new techniques to construct and integrate ISSA into modern compilers and evaluate the benefit of using ISSA form.
First, we present an algorithm that converts the IR into ISSA form by introducing new instructions. To our knowledge, this is the first IR-based ISSA proposed in the literature. Moreover, in comparison to previous work we increase the number of SSA variables, extend the scope of definitions to the whole program, and perform interprocedural copy propagation.
Next, we propose an out-of-ISSA translation that simplifies the integration of ISSA form into a compiler. Our out-of-ISSA translation algorithm enables us to leverage ISSA to improve performance without having to update every compiler pass. Moreover, we demonstrate the benefit of ISSA for a number of compiler optimizations.
Finally, we present an ISSA-based interprocedural induction variable analysis. Our implementation introduces only a few changes to the SSA-based implementation while enabling us to identify considerably more induction variables and compute more loop trip counts.
|
20 |
Interprocedural Static Single Assignment FormCalman, Silvian 09 June 2011 (has links)
Static Single Assignment (SSA) is an Intermediate Representation (IR) that simplifies the design and implementation of analyses and optimizations. While intraprocedural SSA is ubiquitous in modern compilers, the use of interprocedural SSA (ISSA), although seemingly a natural extension, is limited. In this dissertation, we propose new techniques to construct and integrate ISSA into modern compilers and evaluate the benefit of using ISSA form.
First, we present an algorithm that converts the IR into ISSA form by introducing new instructions. To our knowledge, this is the first IR-based ISSA proposed in the literature. Moreover, in comparison to previous work we increase the number of SSA variables, extend the scope of definitions to the whole program, and perform interprocedural copy propagation.
Next, we propose an out-of-ISSA translation that simplifies the integration of ISSA form into a compiler. Our out-of-ISSA translation algorithm enables us to leverage ISSA to improve performance without having to update every compiler pass. Moreover, we demonstrate the benefit of ISSA for a number of compiler optimizations.
Finally, we present an ISSA-based interprocedural induction variable analysis. Our implementation introduces only a few changes to the SSA-based implementation while enabling us to identify considerably more induction variables and compute more loop trip counts.
|
Page generated in 0.4103 seconds