Packet processing is the enabling technology of networked information systems
such as the Internet and is usually performed with fixed-function custom-made
ASIC chips. As communication protocols evolve rapidly, there is increasing
interest in adapting features of the processing over time and, since software
is the preferred way of expressing complex computation, we are interested in
finding a platform to execute packet processing software with the best
possible throughput. Because FPGAs are widely used in network equipment and
they can implement processors, we are motivated to investigate executing
software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric
are currently geared towards performing embedded sequential tasks and, in
contrast, network processing is most often inherently parallel between packet
flows, if not between each individual packet.
Our goal is to allow multiple threads of execution in an FPGA to reach a
higher aggregate throughput than commercially available shared-memory soft
multi-processors via improvements to the underlying soft processor
architecture. We study a number of processor pipeline organizations to
identify which ones can scale to a larger number of execution threads and find
that tuning multithreaded pipelines can provide compact cores with high
throughput. We then perform a design space exploration of multicore soft
systems, compare single-threaded and multithreaded designs to identify
scalability limits and develop processor architectures allowing threads to
execute with as little architectural stalls as possible: in particular with
instruction replay and static hazard detection mechanisms. To further reduce
the wait times, we allow threads to speculatively execute by leveraging
transactional memory. Our multithreaded multiprocessor along with our
compilation and simulation framework makes the FPGA easy to use for an average
programmer who can write an application as a single thread of computation with
coarse-grained synchronization around shared data structures. Comparing with
multithreaded processors using lock-based synchronization, we measure up to
57\% additional throughput with the use of transactional-memory-based
synchronization. Given our applications, gigabit interfaces and 125 MHz system
clock rate, our results suggest that soft processors can process packets in
software at high throughput and low latency, while capitalizing on the FPGAs
already available in network equipment.
Identifer | oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/27612 |
Date | 16 June 2011 |
Creators | Martin, Labrecque |
Contributors | Gregory, Steffan |
Source Sets | University of Toronto |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0021 seconds