Network processors are custom high performance embedded processors deployed for a variety of tasks that must operate at high line (Gbits/sec) speeds to prevent packet loss. With the increase in complexity of application domains and larger code store on modern network processors, the network processor programming goes beyond simply exploiting parallelism in packet processing. Unlike the traditional homogeneous threading model, modern network
processor programming must support heterogenous threads that execute simultaneously on a microengine. In order to support such demands, we first propose hardware management of
registers across multiple threads. In their PLDI 2004 paper, Zhuang and Pande for the first time proposed a compiler based scheme to support
register allocation across threads; in this work, we extend their static allocation
method to support aggressive register allocation taking dynamic context into account. We also remove the load/stores due to aliased memory
accesses converting them into register moves exploiting dead registers. This results in tremendous savings in latency and higher throughput mainly due to the removal of high latency accesses as well as idle cycles. The
dynamic register allocator is designed to be light-weight and low latency
by undertaking many tradeoffs.
In the second part of this work, our goal is to design an automatic
register allocation scheme that makes compiler transperant to dual bank
register file design for network processors. By design network
processors mandate that the operands of an instruction must be
allocated to registers belonging to two different banks. The key goal in
this work is to take
into account dynamic contexts to balance the register pressure across the
banks. Key decisions made involve, how and where to map incoming virtual
register on a physical register in the bank, how to evict dead ones, and
how to minimally undertake bank to bank copies and swaps.
It is shown that it is viable to solve both of these problems by simple
hardware designs that avail of dynamic contexts. The performance gains are
substantial and due to simplicity of the designs (which are also off
critical paths) such schemes may be attractive in practice.
Identifer | oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/11491 |
Date | 22 May 2006 |
Creators | Collins, Ryan |
Publisher | Georgia Institute of Technology |
Source Sets | Georgia Tech Electronic Thesis and Dissertation Archive |
Language | en_US |
Detected Language | English |
Type | Thesis |
Format | 333867 bytes, application/pdf |
Page generated in 0.0021 seconds