• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 119
  • 37
  • 28
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 294
  • 179
  • 121
  • 102
  • 100
  • 68
  • 47
  • 42
  • 40
  • 40
  • 40
  • 37
  • 36
  • 35
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Concurrent and distributed functional systems

Spiliopoulou, Eleni January 2000 (has links)
No description available.
12

An optimizing code generator generator.

Wendt, Alan Lee. January 1989 (has links)
This dissertation describes a system that constructs efficient, retargetable code generators and optimizers. chop reads nonprocedural descriptions of a computer's instruction set and of a naive code generator for the computer, and it writes an integrated code generator and peephole optimizer for it. The resulting code generators are very efficient because they interpret no tables; they are completely hard-coded. Nor do they build complex data structures to communicate between code generation and optimization phases. Interphase communication is reduced to the point that the code generator's output is often encoded in the program counter and conveyed to the optimizer by jumping to the right label. chop's code generator and optimizer are based on a very simple formalism, namely rewriting rules. An instrumented version of the compiler infers the optimization rules as it complies a training suite, and it records them for translation into hard code and inclusion into the production version. I have replaced the Portable C Compiler's code generator with one generated by chop. Despite a costly interface, the resulting compiler runs 30% to 50% faster than the original Portable C Compiler (pcc) and generates comparable code. This figure is diluted by common lexical analysis, parsing, and semantic analysis and by comparable code emission. Allowing for these, the new code generator appears to run approximately seven times faster than that of the original pcc.
13

Techniques and tools for developing Ruby designs

Guo, Shaori January 1997 (has links)
No description available.
14

Reducing a complex instruction set computer.

January 1988 (has links)
Tse Tin-wah. / Thesis (M.Ph.)--Chinese University of Hong Kong, 1988. / Bibliography: leaves [73]-[78]
15

Scaling CFL-reachability-based alias analysis: theory and practice. / 擴展基於CFL-Reachability的別名分析 / Scaling context-free language reachability-based alias analysis / CUHK electronic theses & dissertations collection / Kuo zhan ji yu CFL-Reachability de bie ming fen xi

January 2013 (has links)
Zhang, Qirun. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 170-186). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese.
16

Performance improvement through predicated execution in VLIW machines

Biglari-Abhari, Morteza. January 2000 (has links) (PDF)
Bibliography: leaves 136-153. Investigates techniques to achieve performance improvement in Very Long Instruction Word machines through predicated execution.
17

Dynamic Register Allocation for Network Processors

Collins, Ryan 22 May 2006 (has links)
Network processors are custom high performance embedded processors deployed for a variety of tasks that must operate at high line (Gbits/sec) speeds to prevent packet loss. With the increase in complexity of application domains and larger code store on modern network processors, the network processor programming goes beyond simply exploiting parallelism in packet processing. Unlike the traditional homogeneous threading model, modern network processor programming must support heterogenous threads that execute simultaneously on a microengine. In order to support such demands, we first propose hardware management of registers across multiple threads. In their PLDI 2004 paper, Zhuang and Pande for the first time proposed a compiler based scheme to support register allocation across threads; in this work, we extend their static allocation method to support aggressive register allocation taking dynamic context into account. We also remove the load/stores due to aliased memory accesses converting them into register moves exploiting dead registers. This results in tremendous savings in latency and higher throughput mainly due to the removal of high latency accesses as well as idle cycles. The dynamic register allocator is designed to be light-weight and low latency by undertaking many tradeoffs. In the second part of this work, our goal is to design an automatic register allocation scheme that makes compiler transperant to dual bank register file design for network processors. By design network processors mandate that the operands of an instruction must be allocated to registers belonging to two different banks. The key goal in this work is to take into account dynamic contexts to balance the register pressure across the banks. Key decisions made involve, how and where to map incoming virtual register on a physical register in the bank, how to evict dead ones, and how to minimally undertake bank to bank copies and swaps. It is shown that it is viable to solve both of these problems by simple hardware designs that avail of dynamic contexts. The performance gains are substantial and due to simplicity of the designs (which are also off critical paths) such schemes may be attractive in practice.
18

Algorithms for compiler-assisted design space exploration of clustered VLIW ASIP datapaths /

Lapinskii, Viktor, January 2001 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references (leaves 72-77). Available also in a digital version from Dissertation Abstracts.
19

Braids: out-of-order performance with almost in-order complexity / Out-of-order performance with almost in-order complexity

Tseng, Francis, 1976- 29 August 2008 (has links)
Not available
20

Atomic block formation for explicit data graph execution architectures

Maher, Bertrand Allen 13 December 2010 (has links)
Limits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations. This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture. For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads. This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency. We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively. / text

Page generated in 0.0374 seconds