[en] A LIBRARY FOR THE CREATION OF NETWORK-PROCESSORS-BASED VIRTUAL MACHINES / [pt] UMA BIBLIOTECA PARA CRIAÇÃO DE MÁQUINAS VIRTUAIS BASEADAS EM PROCESSADORES DE REDETELVIO MARTINS DE MELLO 27 June 2005 (has links)
[pt] O objetivo deste trabalho é estudar, propor e implementar uma ferramenta que permita a experimentação com arquiteturas que sigam o paradigma de Processadores de Rede - Network Processors (NP). Com esse intuito, foi implementada uma biblioteca de objetos genéricos que permite emular os diversos componentes de hardware (tais como memórias, registradores, unidades de controle, unidades lógico-aritméticas, etc.) presentes em arquiteturas especificas para o processamento de protocolos. A conjunção desses componentes permite gerar máquinas virtuais que podem ser exercitadas para testar ou verificar o funcionamento das mais diversas operações nesses ambientes. Além da biblioteca, são apresentados três estudos de casos distintos: o primeiro mostrando um processador criado para teste e os outros dois implementam arquiteturas baseadas no processador MCS85 e no núcleo ARM do Processador IXP, todos com o intuito de validar e mostrar a utilidade prática da ferramenta. / [en] The aim of this work is to study, propose and implement a tool that allows the experimentation with architectures that follow the Network Processors (NP) paradigm. A generic object library was implemented, allowing the emulation of the various hardware components, such as memories, registers, arithmetic-and-logical units, control units etc., that are commonly used within specific architectures for protocol processing. The integrated usage of these components will provide an environment where virtual machines can be created and tested to verify the behavior of many different operations. Besides the library itself, three use cases are presented to validate and show the utility of the tool: the first is an implementation of a processor created just for the sake of testing and the other two are implementations of architectures based on the MCS85 processor and on the ARM kernel of the Intel IXP Network Processor.
01 January 2007
Offloading tasks to a network processor is one of the important ways to increase server performance. Hardware offloading of Transmission Control Protocol/Internet Protocol (TCP/IP) intensive tasks is known to significantly improve performance. When the entire application is considered for offloading, the impact on the server can be significant because it significantly reduces the load on the server. The goal of this thesis is to consider such a system with application-level offloading, rather than hardware offloading, and gauge its performance benefits. I am implementing this project on an Apache httpd server (running RedHat Linux), on a system that utilizes a co-located network processor system (IXP2855). The performance of the two implementations is measured using the SPECweb2005 benchmark, which is the accepted industry standard for evaluating Web server performance.
12 July 2004
Modern distributed applications utilize a rich variety of distributed services. Due to the computation-centric notions of modern machines, application-level implementations of these services are problematic for applications requiring high data transfer rates, for reasons that include the inability of modern architectures to efficiently execute computations with communication. Conversely,network-level implementations of services are limited due to the network's inability to interpret application-level data or execute application-level operations on such data. The emergence of programmable network processors capable of high-rate data transfers, with flexible interfaces for external reconfiguration, has created new possibilities for movement of processing into the network infrastructure. This thesis explores the extent to which programmable network processors can be used in conjunction with standard host nodes, to form enhanced computational host-ANP (Attached Network Processor) platforms that can deliver increased efficiency for variety of applications and services. The main contributions of this research are the creation of SPLITS, a Software architecture for Programmable LIghtweighT Stream handling, and its key abstraction stream handlers. SPLITS enables the dynamic configuration of data paths through the host-ANP nodes, and the dynamic creation, deployment and reconfiguration of application-level processing applied along these paths. With SPLITS, application-specific services can be dynamically mapped to the host, ANP, or both, to best exploit their joint capabilities. The basic abstraction used by SPLITS to represent instances of application-specific activities are stream handlers - parameterizable, lightweight, computation units that operate on data headers as well as application-level content. Experimental results demonstrate performance gains of executing various application-level services on ANPs, and demonstrate the importance of the SPLITS host-ANP nodes to support dynamically reconfigurable services, and to deal with the resource limitations on the ANPs.
Approved for public release, distribution is unlimited / In order to address the requirements of the rapidly growing Internet, network processors have emerged as the solution to the customization and performance needs of networking systems. An important component in a network is the router, which receives incoming packets and directs them to specific routes elsewhere in the system. Network processors and the associated software control the routers and switches and allow software designers to deploy new systems such as multicasting forwarder and firewalls quickly.This thesis introduces network processors and their features, focusing on the Intel IXP1200 network processor. A multicast design for the IXP1200 using microACE is proposed. This thesis presents an approach to building a multicasting forwarder using the IXP1200 network processor layer-3 forwarder microACE that carries out unicast routing. The design is based on the Intel Internet exchange architecture and its active computing element (ACE). The layer-3 unicast forwarder microACE is used as a basic starting point for the design. Some software modules, called micoblocks, are modified to create a multicast forwarder that is flexible and efficient. / Lieutenant Junior Grade, Turkish Navy
Iqbal, Muhammad Faisal
06 September 2013
Network Processors are multicore processors capable of processing network packets at wire speeds of multi-Gbps. Due to their high performance and programmability, these processors have become the main computing elements in many demanding network processing equipments like enterprise, edge and core routers. With the ever increasing use of the internet, the processing demands of these routers have also increased. As a result, the number and complexity of the cores in network processors have also increased. Hence, efficiently managing these cores has become very challenging. This dissertation discusses two main issues related to efficient usage of large number of parallel cores in network processors: (1) How to allocate work to the processing cores to optimize performance? (2) How to meet the desired performance requirement power efficiently? This dissertation presents the design of a hash based scheduler to distribute packets to cores. The scheduler exploits multiple dimensions of locality to improve performance while minimizing out of order delivery of packets. This scheduler is designed to work seamlessly when the number of cores allocated to a service is changed. The design of a resource allocator is also presented which allocates different number of cores to services with changing traffic behavior. To improve the power efficiency, a traffic aware power management scheme is presented which exploits variations in traffic rates to save power. The results of simulation studies are presented to evaluate the proposals using real and synthetic network traces. These experiments show that the proposed packet scheduler can improve performance by as much as 40% by improving locality. It is also observed that traffic variations can be exploited to save significant power by turning off the unused cores or by running them at lower frequencies. Improving performance of the individual cores by careful scheduling also helps to reduce the power consumption because the same amount of work can now be done with fewer cores with improved performance. The proposals made in this dissertation show promising improvements over the previous work. Hashing based schedulers have very low overhead and are very suitable for data rates of 100 Gbps and even beyond. / text
Lindholm, Jeffery L.
As network processors have advanced in speed and efficiency they have become more and more complex in both hardware and software configurations. Intel's IXP1200 is one of these new network processors that has been given to different universities worldwide to conduct research on. The goal of this thesis is to take the first step in starting that research by providing a stable system that can provide a reliable platform for further research. This thesis introduces the fundamental hardware of Intel's IXP1200 and what it takes to install both hardware and software using both Windows 2000 and Linux 7.2 as the operating system in support for the IXP1200. This thesis will provide information on the installation of hardware and software configuration for the IXP1200 including Intel's Software Development Kit (SDK). Upon completion this platform can then be used to conduct further research in the development of the IXP1200 network processor. It provides a hardware and software installation checklist and documentations of problems encountered and recommendations for their resolution. Along with providing an example of using preexisting code that has been modified to filter packets of TCP or UDP to different ports.
Multithreading is a processor technique that can effectively hide long latencies that can occur due to memory accesses, coprocessor operations and similar. While this looks promising, there is an additional hardware cost that will vary with for example the number of contexts to switch to and what technique is used for it and this might limit the possible gain of multithreading. Network processors are, traditionally, multiprocessor systems that share a lot of common resources, such as memories and coprocessors, so the potential gain of multithreading could be high for these applications. On the other hand, the increased hardware required will be relatively high since the rest of the processor is fairly small. Instead of having a multithreaded processor, higher performance gains could be achieved by using more processors instead. As a solution, a simulator was built where a system can effectively be modelled and where the simulation results can give hints of the optimal solution for a system in the early design phase of a network processor system. A theoretical background to multithreading, network processors and more is also provided in the thesis.
11 January 2010
Η μικρή ταχύτητα επεξεργασίας των πακέτων που μεταδίδονται στα δίκτυα σε σχέση με την ταχύτητα μετάδοσης τους μέσα σε αυτά, δημιουργεί την ανάγκη για την εφαρμογή καινοτομιών στα συστήματα δικτύωσης, με σκοπό την ελάττωση αυτού του χάσματος και την καλύτερη εκμετάλλευση των μεγάλων ταχυτήτων μετάδοσης δεδομένων. Το πρόβλημα αυτό είναι γνωστό ως «πρόβλημα διατήρησης της ρυθμαπόδοσης» Η ενσωμάτωση επεξεργαστών στα δικτυακά συστήματα έχει βοηθήσει στην αντιμετώπιση του προβλήματος. Μια αρχιτεκτονική που προτείνεται για αυτούς τους επεξεργαστές πρωτοκόλλου, όπως ονομάζονται, εισάγει τη χρήση μιας καινοτόμας δομής καταχωρητών με την ονομασία Τripod. Η ιδέα της είναι η αντικατάσταση του επεξεργαστή που βρίσκεται στο εσωτερικό ενός προσαρμογέα δικτύου, από έναν επεξεργαστικό πυρήνα και τρεις ξεχωριστές πανομοιότυπες ομάδες καταχωρητών. Το όλο σύστημα θα λειτουργεί σε μια λογική διοχέτευσης (pipeline) με τα εξής στάδια: φόρτωσης, επεξεργασίας, εκφόρτωσης. Σκοπός αυτής της διπλωματικής είναι η σχεδίαση ενός υποσυτήματος το οποίο θα διαχειρίζεται αυτές τις ομάδες καταχωρητών και θα επιτρέπει στο σύστημα να λειτουργεί σύμφωνα με τις προδιαγραφές. Πιο συγκεκριμένα, θα υλοποιεί την φόρτωση και εκφόρτωση δεδομένων προς και από τους καταχωρητές, καθώς και την σύνδεση της κατάλληλης ομάδας καταχωρητών με τον επεξεργαστικό πυρήνα. / The low processing speed of packets that are transmitted in the networks compared to their transmission speed, creates the need for inserting innovations in the network systems, aiming at the alleviation of this gap and the better exploitation of the high transmission speeds of data. This problem is known as “the throughput preservation problem” The incorporation of embedded processors in the network systems has helped in the confrontation of the problem. An architecture that is proposed for these protocol processors, as they are named, imports the use of an innovative register structure, called Tripod. The idea is the replacement of the processor that is found in the interior of a network adapter, with a processor core and three separate similar register files. The system will function in a logic of pipeline with the following stages: loading, processing, unloading. Aim of this diploma thesis is the designing of a subsystem that will manage these register files and make the system to function according to the specifications. More concretely, it will execute the loading and unloading of data to and from the registers, as well as the connection of the suitable register file with the processor core.
Kumarapillai Chandrikakutty, Harikrishnan
01 January 2013
Technological advancements have transformed the way people interact with the world. The Internet now forms a critical infrastructure that links different aspects of our life like personal communication, business transactions, social networking, and advertising. In order to cater to this ever increasing communication overhead there has been a fundamental shift in the network infrastructure. Modern network routers often employ software programmable network processors instead of ASIC-based technology for higher throughput performance and adaptability to changing resource requirements. This programmability makes networking infrastructure vulnerable to new class of network attacks by compromising the software on network processors. This issue has resulted in the need for security systems which can monitor the behavior of network processors at run time. This thesis describes an FPGA-based security monitoring system for multi-core network processors. The implemented security monitor improves upon previous hardware monitoring schemes. We demonstrate a state machine based hardware programmable monitor which can track program execution flow at run time. Applications are analyzed offline and a hash of the instructions is generated to form a state machine sequence. If the state machine deviates from expected behavior, an error flag is raised, forcing a network processor reset. For testing purposes, the monitoring logic along with the multi-core network processor system is implemented in FPGA logic. In this research, we modify the network processor memory architecture to improve security monitor functionality. The efficiency of this approach is validated using a diverse set of network benchmarks. Experiments are performed on the prototype system using known network attacks to test the performance of the monitoring subsystem. Experimental results demonstrate that out security monitor approach provides an efficient monitoring system in detecting and recovering from network attacks with minimum overhead while maintaining line rate packet forwarding. Additionally, our monitor is capable of defending against attacks on processor with a Harvard architecture, the dominant contemporary network processor organization. We demonstrate that our monitor architecture provides no network slowdown in the absence of an attack and provides the capability to drop packets without otherwise affecting regular network traffic when an attack occurs.
01 January 2005
In this work, we present off-chip communications architectures for line cards to increase the throughput of the currently used memory system. In recent years there is a significant increase in memory bandwidth demand on line cards as a result of higher line rates, an increase in deep packet inspection operations and an unstoppable expansion in lookup tables. As line-rate data and NPU processing power increase, memory access time becomes the main system bottleneck during data store/retrieve operations. The growing demand for memory bandwidth contrasts the notion of indirect interconnect methodologies. Moreover, solutions to the memory bandwidth bottleneck are limited by physical constraints such as area and NPU I/O pins. Therefore, indirect interconnects are replaced with direct, packet-based networks such as mesh, torus or k-ary n-cubes. We investigate multiple k-ary n-cube based interconnects and propose two variations of 2-ary 3-cube interconnect called the 3D-bus and 3D-mesh. All of the k-ary n-cube interconnects include multiple, highly efficient techniques to route, switch, and control packet flows in order to minimize congestion spots and packet loss. We explore the tradeoffs between implementation constraints and performance. We also developed an event-driven, interconnect simulation framework to evaluate the performance of packet-based off-chip k-ary n-cube interconnect architectures for line cards. The simulator uses the state-of-the-art software design techniques to provide the user with a flexible yet robust tool, that can emulate multiple interconnect architectures under non-uniform traffic patterns. Moreover, the simulator offers the user with full control over network parameters, performance enhancing features and simulation time frames that make the platform as identical as possible to the real line card physical and functional properties. By using our network simulator, we reveal the best processor-memory configuration, out of multiple configurations, that achieves optimal performance. Moreover, we explore how network enhancement techniques such as virtual channels and sub-channeling improve network latency and throughput. Our performance results show that k-ary n-cube topologies, and especially our modified version of 2-ary 3-cube interconnect - the 3D-mesh, significantly outperform existing line card interconnects and are able to sustain higher traffic loads. The flow control mechanism proved to extensively reduce hot-spots, load-balance areas of high traffic rate and achieve low transmission failure rate. Moreover, it can scale to adopt more memories and/or processors and as a result to increase the line card's processing power.
Page generated in 0.1064 seconds