Spelling suggestions: "subject:"compiler"" "subject:"compiled""
11 |
Concurrent and distributed functional systemsSpiliopoulou, Eleni January 2000 (has links)
No description available.
|
12 |
An optimizing code generator generator.Wendt, Alan Lee. January 1989 (has links)
This dissertation describes a system that constructs efficient, retargetable code generators and optimizers. chop reads nonprocedural descriptions of a computer's instruction set and of a naive code generator for the computer, and it writes an integrated code generator and peephole optimizer for it. The resulting code generators are very efficient because they interpret no tables; they are completely hard-coded. Nor do they build complex data structures to communicate between code generation and optimization phases. Interphase communication is reduced to the point that the code generator's output is often encoded in the program counter and conveyed to the optimizer by jumping to the right label. chop's code generator and optimizer are based on a very simple formalism, namely rewriting rules. An instrumented version of the compiler infers the optimization rules as it complies a training suite, and it records them for translation into hard code and inclusion into the production version. I have replaced the Portable C Compiler's code generator with one generated by chop. Despite a costly interface, the resulting compiler runs 30% to 50% faster than the original Portable C Compiler (pcc) and generates comparable code. This figure is diluted by common lexical analysis, parsing, and semantic analysis and by comparable code emission. Allowing for these, the new code generator appears to run approximately seven times faster than that of the original pcc.
|
13 |
Techniques and tools for developing Ruby designsGuo, Shaori January 1997 (has links)
No description available.
|
14 |
Reducing a complex instruction set computer.January 1988 (has links)
Tse Tin-wah. / Thesis (M.Ph.)--Chinese University of Hong Kong, 1988. / Bibliography: leaves [73]-[78]
|
15 |
Scaling CFL-reachability-based alias analysis: theory and practice. / 擴展基於CFL-Reachability的別名分析 / Scaling context-free language reachability-based alias analysis / CUHK electronic theses & dissertations collection / Kuo zhan ji yu CFL-Reachability de bie ming fen xiJanuary 2013 (has links)
Zhang, Qirun. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 170-186). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese.
|
16 |
Performance improvement through predicated execution in VLIW machinesBiglari-Abhari, Morteza. January 2000 (has links) (PDF)
Bibliography: leaves 136-153. Investigates techniques to achieve performance improvement in Very Long Instruction Word machines through predicated execution.
|
17 |
Dynamic Register Allocation for Network ProcessorsCollins, Ryan 22 May 2006 (has links)
Network processors are custom high performance embedded processors deployed for a variety of tasks that must operate at high line (Gbits/sec) speeds to prevent packet loss. With the increase in complexity of application domains and larger code store on modern network processors, the network processor programming goes beyond simply exploiting parallelism in packet processing. Unlike the traditional homogeneous threading model, modern network
processor programming must support heterogenous threads that execute simultaneously on a microengine. In order to support such demands, we first propose hardware management of
registers across multiple threads. In their PLDI 2004 paper, Zhuang and Pande for the first time proposed a compiler based scheme to support
register allocation across threads; in this work, we extend their static allocation
method to support aggressive register allocation taking dynamic context into account. We also remove the load/stores due to aliased memory
accesses converting them into register moves exploiting dead registers. This results in tremendous savings in latency and higher throughput mainly due to the removal of high latency accesses as well as idle cycles. The
dynamic register allocator is designed to be light-weight and low latency
by undertaking many tradeoffs.
In the second part of this work, our goal is to design an automatic
register allocation scheme that makes compiler transperant to dual bank
register file design for network processors. By design network
processors mandate that the operands of an instruction must be
allocated to registers belonging to two different banks. The key goal in
this work is to take
into account dynamic contexts to balance the register pressure across the
banks. Key decisions made involve, how and where to map incoming virtual
register on a physical register in the bank, how to evict dead ones, and
how to minimally undertake bank to bank copies and swaps.
It is shown that it is viable to solve both of these problems by simple
hardware designs that avail of dynamic contexts. The performance gains are
substantial and due to simplicity of the designs (which are also off
critical paths) such schemes may be attractive in practice.
|
18 |
Algorithms for compiler-assisted design space exploration of clustered VLIW ASIP datapaths /Lapinskii, Viktor, January 2001 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references (leaves 72-77). Available also in a digital version from Dissertation Abstracts.
|
19 |
Braids: out-of-order performance with almost in-order complexity / Out-of-order performance with almost in-order complexityTseng, Francis, 1976- 29 August 2008 (has links)
Not available
|
20 |
Atomic block formation for explicit data graph execution architecturesMaher, Bertrand Allen 13 December 2010 (has links)
Limits on power consumption, complexity, and on-chip latency have
focused computer architects on power-efficient designs that exploit
parallelism. One approach divides programs into atomic blocks of
operations that execute semi-independently, which efficiently creates
a large window of potentially concurrent operations. This
dissertation studies the intertwined roles of the compiler,
architecture, and microarchitecture in achieving efficiency and high
performance with a block-atomic architecture.
For such an architecture to achieve high performance the compiler must
form blocks effectively. The compiler must create large blocks of
instructions to amortize the per-block overhead, but control flow and
content restrictions limit the compiler's options. Block formation
should consider factors such of frequency of execution, block size
such as selecting control-flow paths that are frequently executed, and
exploiting locality of computations to reduce communication overheads.
This dissertation determines what characteristics of programs
influence block formation and proposes techniques to generate
effective blocks. The first contribution is a method for solving
phase-ordering problems inherent to block formation, mitigating the
tension between block-enlarging optimizations---if-conversion, tail
duplication, loop unrolling, and loop peeling---as well as scalar
optimizations. Given these optimizations, analysis shows that the
remaining obstacles to creating larger blocks are inherent in the
control flow structure of applications, and furthermore that any fixed
block size entails a sizable amount of wasted space. To eliminate
this overhead, this dissertation proposes an architectural
implementation of variable-size blocks that allow the compiler to
dramatically improve block efficiency.
We use these mechanisms to develop policies for block formation that
achieve high performance on a range of applications and processor
configurations. We find that the best policies differ significantly
depending on the number of participating cores. Using machine
learning, we discover generalized policies for particular hardware
configurations and find that the best policy varies significantly
between applications and based on the number of parallel resources
available in the microarchitecture. These results show that effective
and efficient block-atomic execution is possible when the compiler and
microarchitecture are designed cooperatively. / text
|
Page generated in 0.0624 seconds