Global ETD Search

1	Dynamic Register Allocation for Network Processors Collins, Ryan 22 May 2006 (has links) Network processors are custom high performance embedded processors deployed for a variety of tasks that must operate at high line (Gbits/sec) speeds to prevent packet loss. With the increase in complexity of application domains and larger code store on modern network processors, the network processor programming goes beyond simply exploiting parallelism in packet processing. Unlike the traditional homogeneous threading model, modern network processor programming must support heterogenous threads that execute simultaneously on a microengine. In order to support such demands, we first propose hardware management of registers across multiple threads. In their PLDI 2004 paper, Zhuang and Pande for the first time proposed a compiler based scheme to support register allocation across threads; in this work, we extend their static allocation method to support aggressive register allocation taking dynamic context into account. We also remove the load/stores due to aliased memory accesses converting them into register moves exploiting dead registers. This results in tremendous savings in latency and higher throughput mainly due to the removal of high latency accesses as well as idle cycles. The dynamic register allocator is designed to be light-weight and low latency by undertaking many tradeoffs. In the second part of this work, our goal is to design an automatic register allocation scheme that makes compiler transperant to dual bank register file design for network processors. By design network processors mandate that the operands of an instruction must be allocated to registers belonging to two different banks. The key goal in this work is to take into account dynamic contexts to balance the register pressure across the banks. Key decisions made involve, how and where to map incoming virtual register on a physical register in the bank, how to evict dead ones, and how to minimally undertake bank to bank copies and swaps. It is shown that it is viable to solve both of these problems by simple hardware designs that avail of dynamic contexts. The performance gains are substantial and due to simplicity of the designs (which are also off critical paths) such schemes may be attractive in practice. Compilers Register allocation Network processors
2	Concurrent Implementation of Packet Processing Algorithms on Network Processors Groves, Mark January 2006 (has links) Network Processor Units (NPUs) are a compromise between software-based and hardwired packet processing solutions. While slower than hardwired solutions, NPUs have the flexibility of software-based solutions, allowing them to adapt faster to changes in network protocols. <br /><br /> Network processors have multiple processing engines so that multiple packets can be processed simultaneously within the NPU. In addition, each of these processing engines is multi-threaded, with special hardware support built in to alleviate some of the cost of concurrency. This hardware design allows the NPU to handle multiple packets concurrently, so that while one thread is waiting for a memory access to complete, another thread can be processing a different packet. By handling several packets simultaneously, an NPU can achieve similar processing power as traditional packet processing hardware, but with greater flexibility. <br /><br /> The flexibility of network processors is also one of the disadvantages associated with them. Programming a network processor requires an in-depth understanding of the hardware as well as a solid foundation in concurrent design and programming. This thesis explores the challenges of programming a network processor, the Intel IXP2400, using a single-threaded packet scheduling algorithm as a sample case. The algorithm used is a GPS approximation scheduler with constant time execution. The thesis examines the process of implementing the algorithm in a multi-threaded environment, and discusses the scalability and load-balancing aspects of such an algorithm. In addition, optimizations are made to the scheduler implementation to improve the potential concurrency. The synchronization primitives available on the network processor are also examined, as they play a significant part in minimizing the overhead required to synchronize memory accesses by the algorithm. Computer Science Packet scheduling network processors concurrency
3	Concurrent Implementation of Packet Processing Algorithms on Network Processors Groves, Mark January 2006 (has links) Network Processor Units (NPUs) are a compromise between software-based and hardwired packet processing solutions. While slower than hardwired solutions, NPUs have the flexibility of software-based solutions, allowing them to adapt faster to changes in network protocols. <br /><br /> Network processors have multiple processing engines so that multiple packets can be processed simultaneously within the NPU. In addition, each of these processing engines is multi-threaded, with special hardware support built in to alleviate some of the cost of concurrency. This hardware design allows the NPU to handle multiple packets concurrently, so that while one thread is waiting for a memory access to complete, another thread can be processing a different packet. By handling several packets simultaneously, an NPU can achieve similar processing power as traditional packet processing hardware, but with greater flexibility. <br /><br /> The flexibility of network processors is also one of the disadvantages associated with them. Programming a network processor requires an in-depth understanding of the hardware as well as a solid foundation in concurrent design and programming. This thesis explores the challenges of programming a network processor, the Intel IXP2400, using a single-threaded packet scheduling algorithm as a sample case. The algorithm used is a GPS approximation scheduler with constant time execution. The thesis examines the process of implementing the algorithm in a multi-threaded environment, and discusses the scalability and load-balancing aspects of such an algorithm. In addition, optimizations are made to the scheduler implementation to improve the potential concurrency. The synchronization primitives available on the network processor are also examined, as they play a significant part in minimizing the overhead required to synchronize memory accesses by the algorithm. Computer Science Packet scheduling network processors concurrency
4	Network processors and utilizing their features in a multicast design / Diler, Timur. January 2004 (has links) (PDF) Thesis (M.S. in Computer Science and M.S. in Electrical Engineering)--Naval Postgraduate School, March 2004. / Thesis advisor(s): Su Wen, Jon Butler. Includes bibliographical references (p. 53-54). Also available online.
5	[en] A LIBRARY FOR THE CREATION OF NETWORK-PROCESSORS-BASED VIRTUAL MACHINES / [pt] UMA BIBLIOTECA PARA CRIAÇÃO DE MÁQUINAS VIRTUAIS BASEADAS EM PROCESSADORES DE REDE TELVIO MARTINS DE MELLO 27 June 2005 (has links) [pt] O objetivo deste trabalho é estudar, propor e implementar uma ferramenta que permita a experimentação com arquiteturas que sigam o paradigma de Processadores de Rede - Network Processors (NP). Com esse intuito, foi implementada uma biblioteca de objetos genéricos que permite emular os diversos componentes de hardware (tais como memórias, registradores, unidades de controle, unidades lógico-aritméticas, etc.) presentes em arquiteturas especificas para o processamento de protocolos. A conjunção desses componentes permite gerar máquinas virtuais que podem ser exercitadas para testar ou verificar o funcionamento das mais diversas operações nesses ambientes. Além da biblioteca, são apresentados três estudos de casos distintos: o primeiro mostrando um processador criado para teste e os outros dois implementam arquiteturas baseadas no processador MCS85 e no núcleo ARM do Processador IXP, todos com o intuito de validar e mostrar a utilidade prática da ferramenta. / [en] The aim of this work is to study, propose and implement a tool that allows the experimentation with architectures that follow the Network Processors (NP) paradigm. A generic object library was implemented, allowing the emulation of the various hardware components, such as memories, registers, arithmetic-and-logical units, control units etc., that are commonly used within specific architectures for protocol processing. The integrated usage of these components will provide an environment where virtual machines can be created and tested to verify the behavior of many different operations. Besides the library itself, three use cases are presented to validate and show the utility of the tool: the first is an implementation of a processor created just for the sake of testing and the other two are implementations of architectures based on the MCS85 processor and on the ARM kernel of the Intel IXP Network Processor. [pt] MAQUINAS VIRTUAIS [en] VIRTUAL MACHINES [pt] PROCESSADORES DE REDE [en] NETWORK PROCESSORS
6	SPLITS Stream Handlers: Deploying Application-level Services to Attached Network Processor Gavrilovska, Ada 12 July 2004 (has links) Modern distributed applications utilize a rich variety of distributed services. Due to the computation-centric notions of modern machines, application-level implementations of these services are problematic for applications requiring high data transfer rates, for reasons that include the inability of modern architectures to efficiently execute computations with communication. Conversely,network-level implementations of services are limited due to the network's inability to interpret application-level data or execute application-level operations on such data. The emergence of programmable network processors capable of high-rate data transfers, with flexible interfaces for external reconfiguration, has created new possibilities for movement of processing into the network infrastructure. This thesis explores the extent to which programmable network processors can be used in conjunction with standard host nodes, to form enhanced computational host-ANP (Attached Network Processor) platforms that can deliver increased efficiency for variety of applications and services. The main contributions of this research are the creation of SPLITS, a Software architecture for Programmable LIghtweighT Stream handling, and its key abstraction stream handlers. SPLITS enables the dynamic configuration of data paths through the host-ANP nodes, and the dynamic creation, deployment and reconfiguration of application-level processing applied along these paths. With SPLITS, application-specific services can be dynamically mapped to the host, ANP, or both, to best exploit their joint capabilities. The basic abstraction used by SPLITS to represent instances of application-specific activities are stream handlers - parameterizable, lightweight, computation units that operate on data headers as well as application-level content. Experimental results demonstrate performance gains of executing various application-level services on ANPs, and demonstrate the importance of the SPLITS host-ANP nodes to support dynamically reconfigurable services, and to deal with the resource limitations on the ANPs. Active middleware Active networking Streaming applications Network processors
7	A CAM-Based, High-Performance Classifier-Scheduler for a Video Network Processor. Tarigopula, Srivamsi 05 1900 (has links) Classification and scheduling are key functionalities of a network processor. Network processors are equipped with application specific integrated circuits (ASIC), so that as IP (Internet Protocol) packets arrive, they can be processed directly without using the central processing unit. A new network processor is proposed called the video network processor (VNP) for real time broadcasting of video streams for IP television (IPTV). This thesis explores the challenge in designing a combined classification and scheduling module for a VNP. I propose and design the classifier-scheduler module which will classify and schedule data for VNP. The proposed module discriminates between IP packets and video packets. The video packets are further processed for digital rights management (DRM). IP packets which carry regular traffic will traverse without any modification. Basic architecture of VNP and architecture of classifier-scheduler module based on content addressable memory (CAM) and random access memory (RAM) has been proposed. The module has been designed and simulated in Xilinx 9.1i; is built in ISE simulator with a throughput of 1.79 Mbps and a maximum working frequency of 111.89 MHz at a power dissipation of 33.6mW. The code has been translated and mapped for Spartan and Virtex family of devices. Video network processor scheduling classification Network processors. Multicasting (Computer networks)
8	Network processors and utilizing their features in a multicast design Diler, Timur 03 1900 (has links) Approved for public release, distribution is unlimited / In order to address the requirements of the rapidly growing Internet, network processors have emerged as the solution to the customization and performance needs of networking systems. An important component in a network is the router, which receives incoming packets and directs them to specific routes elsewhere in the system. Network processors and the associated software control the routers and switches and allow software designers to deploy new systems such as multicasting forwarder and firewalls quickly.This thesis introduces network processors and their features, focusing on the Intel IXP1200 network processor. A multicast design for the IXP1200 using microACE is proposed. This thesis presents an approach to building a multicasting forwarder using the IXP1200 network processor layer-3 forwarder microACE that carries out unicast routing. The design is based on the Intel Internet exchange architecture and its active computing element (ACE). The layer-3 unicast forwarder microACE is used as a basic starting point for the design. Some software modules, called micoblocks, are modified to create a multicast forwarder that is flexible and efficient. / Lieutenant Junior Grade, Turkish Navy Multicasting (Computer networks) Network processors Intel microprocessors Computer architecture 0 Multicasting ACE Microace Network processors Intel IXA Microengine
9	Performance Modeling And Evaluation Of Network Processors Govind, S 12 1900 (has links) In recent years there has been an exponential growth in Internet traffic resulting in increased network bandwidth requirements which, in turn, has led to stringent processing requirements on network layer devices like routers. Present backbone routers on OC 48 links (2.5Gbps) have to process four million minimum-sized packets per second. Further, the functionality supported in the network devices is also on the increase leading to programmable processors, such as Intel's IXP, Motorola's C5 and IBM's.NP. These processors support multiple processors and multiple threads to exploit packet-level-parallelism inherent in network workloads. This thesis studies the performance of network processors. We develop a Petri Net model for a commercial network processors (Intel IXP 2400,2850) for three different applications viz., IPv4 forwarding, Network Address Translation and IP security protocols. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. Our Performance evaluation results indicate that 1. The IXP processor is able to support a throughput of 2.5 Gbps for all modeled applications. 2. Packet buffer memory (DRAM) is the bottleneck resource in a network proces sor and even multithreading is ineffective beyond a total of 16 threads in case of header processing applications and beyond 32 threads for payload processing applications. Since DRAM is the bottleneck resource we explore the benefits of increasing the DRAM banks and other software schemes like offloading the packet header to SRAM. The second part of the thesis studies the impact of parallel processing in network processor on packet reordering and retransmission. Our results indicate that the concurrent processing of packets in a network processor and buffer allocation schemes in TFIFO leads to a significant packet reordering, (61%), on a 10-hop network (with packet sizes of 64 B) which in turn leads to a 76% retransmission under the TCP fast-restransmission algorithm. We explore different transmit buffer allocation schemes namely, contiguous, strided, local, and global for transmit buffer which reduces the packet retransmission to 24%. Our performance results also indicate that limiting the number of microengines can reduce the extent of packet reordering while providing the same throughput. We propose an alternative scheme, Packetsort, which guarantees complete packet ordering while achieving a throughput of 2.5 Gbps. Further, we observe that Packetsort outperforms, by up to 35%, the in-built schemes in the IXP processor namely, Inter Thread Signaling (ITS) and Asynchronous Insert and Synchronous Remove (AISR). The final part of this thesis investigates the performance of the network processor in a bursty traffic scenario. We model bursty traffic using a Pareto distribution. We consider a parallel and pipelined buffering schemes and their impact on packet drop under bursty traffic. Our results indicate that the pipelined buffering scheme outperforms the parallel scheme. Network Processors Computer Networks Petri Nets Petri Net Model Network Processors - Packet Reordering Network Processors - Perormance Analysis Bursty Traffic Computer Science
10	Utilizing IXP1200 hardware and software for packet filtering Lindholm, Jeffery L. 12 1900 (has links) As network processors have advanced in speed and efficiency they have become more and more complex in both hardware and software configurations. Intel's IXP1200 is one of these new network processors that has been given to different universities worldwide to conduct research on. The goal of this thesis is to take the first step in starting that research by providing a stable system that can provide a reliable platform for further research. This thesis introduces the fundamental hardware of Intel's IXP1200 and what it takes to install both hardware and software using both Windows 2000 and Linux 7.2 as the operating system in support for the IXP1200. This thesis will provide information on the installation of hardware and software configuration for the IXP1200 including Intel's Software Development Kit (SDK). Upon completion this platform can then be used to conduct further research in the development of the IXP1200 network processor. It provides a hardware and software installation checklist and documentations of problems encountered and recommendations for their resolution. Along with providing an example of using preexisting code that has been modified to filter packets of TCP or UDP to different ports. Computer programming Computer input-output equipment Computer software Network processors Microcomputers Programming

Search results