Global ETD Search

1	Power and Memory Efficient Hashing Schemes for Some Network Applications Yu, Heeyeol 2009 May 1900 (has links) Hash tables (HTs) are used to implement various lookup schemes and they need to be efficient in terms of speed, space utilization, and power consumptions. For IP lookup, the hashing schemes are attractive due to their deterministic O(1) lookup performance and low power consumptions, in contrast to the TCAM and Trie based approaches. As the size of IP lookup table grows exponentially, scalable lookup performance is highly desirable. For next generation high-speed routers, this is a vital requirement when IP lookup remains in the critical data path and demands a predictable throughput. However, recently proposed hash schemes, like a Bloomier filter HT and a Fast HT (FHT) suffer from a number of flaws, including setup failures, update overheads, duplicate keys, and pointer overheads. In this dissertation, four novel hashing schemes and their architectures are proposed to address the above concerns by using pipelined Bloom filters and a Fingerprint filter which are designed for a memory-efficient approximate match. For IP lookups, two new hash schemes such as a Hierarchically Indexed Hash Table (HIHT) and Fingerprint-based Hash Table (FPHT) are introduced to achieve a a perfect match is assured without pointer overhead. Further, two hash mechanisms are also proposed to provide memory and power efficient lookup for packet processing applications. Among four proposed schemes, the HIHT and the FPHT schemes are evaluated for their performance and compared with TCAM and Trie based IP lookup schemes. Various sizes of IP lookup tables are considered to demonstrate scalability in terms of speed, memory use, and power consumptions. While an FPHT uses less memory than an HIHT, an FPHT-based IP lookup scheme reduces power consumption by a factor of 51 and requires 1.8 times memory compared to TCAM-based and trie-based IP lookup schemes, respectively. In dissertation, a multi-tiered packet classifier has been proposed that saves at most 3.2 times power compared to the existing parallel packet classifier. Intrinsic hashing schemes lack of high throughput, unlike partitioned Ternary Content Addressable Memory (TCAM)-based scheme that are capable of parallel lookups despite large power consumption. A hybrid CAM (HCAM) architecture has been introduced. Simulation results indicate HCAM to achieve the same throughput as contemporary schemes while it uses 2.8 times less memory and 3.6 times less power compared to the contemporary schemes.
2	Stack Protection Mechanisms In Packet Processing Systems Wu, Peng 01 January 2013 (has links) (PDF) As the functionality that current computer network can provide is becoming complicated, a traditional router with application-specific integrated circuit (ASIC) implementation can't satisfy the flexibility requirements. Instead, a programmable packet forward system based on a general-purpose processor could provide the flexibility. While this system provides flexibility, a new potential security issue arises. Usually, software is involved as the packet forward system is programmable. The software's potential vulnerability, especially as to the remote exploits, becomes an issue of network security. In this thesis work, we proposed a software stack overflow vulnerability on click modular router and show how a disastrous denial-of-service attack on click modular router could be triggered by a single packet. In our research work, click modular router runs on Linux operating system based on general-purpose hardware. We actually showed that even a software router run within a modern operating system's protection is vulnerable by elaborate attack. And we checked the possible stack protection mechanisms on modern OS based on general-purpose hardware and proposed a possible stack protection mechanism for embedded OS. Stack Protection Buffer Overflow Packet Processing Systems Digital Communications and Networking
3	Performance Optimization of Virtualized Packet Processing Function for 5G RAN / Prestandaoptimering av virtualiserad packet processing-funktion för 5G RAN Östermark, Filip January 2017 (has links) The advent of the fifth generation mobile networks (5G) presents many new challenges to satisfy the requirements of the upcoming standards. The 5G Radio Access Network (RAN) has several functions which must be highly optimized to keep up with increasing performance requirements. One such function is the Packet Processing Function (PPF) which must process network packets with high throughput and low latency. A major factor in the pursuit of higher throughput and lower latency is adaptability of 5G technology. For this reason, Ericsson has developed a prototype 5G RAN PPF as a Virtualized Network Function (VNF) using an extended version of the Data Plane Development Kit’s Eventdev framework, which can be run on a general purpose computer. This thesis project optimizes the throughput and latency of a 5G RAN PPF prototype using a set of benchmarking and code profiling tools to find bottlenecks within the packet processing path, and then mitigates the effects of these bottlenecks by changing the configuration of the PPF. Experiments were performed using IxNetwork to generate 2 flows with GTP-u/UDP/IPv4 packets for the PPF to process. IxNetwork was also used to measure throughput and latency of the PPF. The results show that the maximum throughput of the PPF prototype could be increased by 40.52% with an average cut-through latency of 97.59% compared to the default configuration in the evaluated test case, by reassigning the CPU cores, performing the packet processing work in fewer pipeline stages, and patching the RSS function of the packet reception (Rx) driver. / Med den annalkande femte generationen av mobila nätverk (5G) följer en rad utmaningar för att uppnå de krav som ställs av kommande standarder. Den femte generationens Radioaccessnätverk (RAN) har flera funktioner som måste vara väloptimerade för att prestera enligt ökade krav. En sådan funktion är Packet Processing-funktionen (PPF), vilken måste kunna bearbeta paket med hög genomströmning och låg latens. En avgörande faktor i jakten på högre genomströmning och lägre latens är anpassningsbarhet hos 5Gteknologin. Ericsson har därför utvecklat en prototyp av en PPF för 5G RAN som en virtuell nätverksfunktion (VNF) med hjälp av DPDK:s Eventdev-ramverk, som kan köras på en dator avsedd för allmän användning. I detta projekt optimeras genomströmningen och latensen hos Ericssons 5G RAN PPF-prototyp med hjälp av ett antal verktyg för prestandamätning och kodprofilering för att hitta flaskhalsar i pakethanteringsvägen, och därefter minska flaskhalsarnas negativa effekt på PPFens prestanda genom att ändra dess konfiguration. I experimenten användes IxNetwork för att generera 2 flöden med GTP-u/UDP/IPv4-paket som bearbetades av PPFen. IxNetwork användes även för att mäta genomströmning och latens. Resultaten visade att den maximala genomströmningen kunde ökas med 40.52% med en genomsnittlig latens på 97.59% jämfört med den ursprungliga PPF-prototypkonfigurationen i testfallet, genom att omfördela processorkärnor, sammanslå paketbearbetningssteg, och att patcha RSS-funktionen hos mottagardrivaren. 5G RAN virtualization packet processing NFV optimization 5G RAN virtualisering packet processing NFV optimering Communication Systems Kommunikationssystem
4	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
5	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
6	High-performance software packet processing Fu, Qiaobin 30 January 2021 (has links) In today’s Internet, it is highly desirable to have fast and scalable software packet processing solutions for network applications that run on commodity hardware. The advent of cloud computing drives the continued rapid growth of Internet traffic. Moreover, the development of emerging networking techniques, such as Network Function Virtualization, significantly shapes the need for implementing the network functions in software. Finally, with the advancement of modern platforms as well as software frameworks for packet processing, network applications have potential to process 100+ Gbps network traffic on a single commodity server. Representative frameworks include the Click modular router, the RouteBricks scalable routing architecture, and BUFFALO, the software-based Ethernet switch. Beneath this general-purpose routing and switching functionality lie a broad set of network applications, many of which are handled with custom methods to provide cost-effectiveness and flexibility. This thesis considers two long-standing networking applications, IP lookup and distributed denial-of-service (DDoS) mitigation, and proposes efficient software-based methods drawing from this new perspective. In this thesis, we first introduce several optimization techniques to accelerate network applications by taking advantage of modern CPU features. Then, we explore the IP lookup problem to find the longest matching prefix of an IP address in a set of prefixes. An ideal IP lookup algorithm should achieve small constant IP lookup time, and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. We propose SAIL, a splitting approach to IP lookup, and a suite of algorithms for IP lookup based on SAIL framework. We conducted extensive experiments to evaluate our algorithms, and experimental results show that our SAIL algorithms are much faster than well-known IP lookup algorithms. Next, we switch our focus to DDoS, an attempt to disrupt the legitimate traffic of a victim by sending a flood of Internet traffic from different sources. Our solution is Gatekeeper, the first open-source and deployable DDoS mitigation system. We present a series of optimization techniques, including use of modern platforms, group prefetching, coroutines, and hashing, to accelerate Gatekeeper. Experimental results show that these optimization techniques significantly improve its performance over alternative baseline solutions. / 2022-01-30T00:00:00Z Computer science Coroutines DoS mitigation Gatekeeper IP lookup SAIL Software packet processing
7	Parallel Memory System Architectures for Packet Processing in Network Virtualization / ネットワーク仮想化におけるパケット処理のための並列メモリシステムアーキテクチャ Korikawa, Tomohiro 23 March 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23326号 / 情博第762号 / 新制\|\|情\|\|130(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授大木英司, 教授守倉正博, 教授岡部寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Memory system architecture Packet processing Network virtualization Queueing analysis 3D-stacked memory 007
8	RISC-V Based Application-Specific Instruction Set Processor for Packet Processing in Mobile Networks Södergren, Oskar January 2021 (has links) This thesis explores the use of an ASIP for handling O-RAN control data. A model application was constructed, optimized and profiled on a simple RV32-IMC core. The compiled code was analyzed, and the instructions “byte swap”, “pack”, bitwise extract/deposit” and “bit field place” were implemented. Synthesis of the core, and profiling of the model application, was done with and without each added instruction. Byte swap had the largest impact on performance (14% improvement per section, and 100% per section extension), followed by bitwise extract/deposit (10% improvement per section but no impact on section extensions). Pack and bit field place had no impact on performance. All instructions had negligible impact on core size, except for bitwise extract/deposit, which increased size by 16%. Further studies, with respect to both overall architecture and further evaluation of instructions to implement, would be necessary to design an ideal ASIP for the application. eCPRI ORAN ASIP fronthaul RISC-V bit manipulation 5G packet processing Computer Engineering Datorteknik
9	Software Datapaths for Multi-Tenant Packet Processing / Plans de données logiciels pour les traitements réseaux en environnements partagés Chaignon, Paul 07 May 2019 (has links) En environnement multi-tenant, les réseaux s'appuient sur un ensemble de ressources matérielles partagées pour permettre à des applications isolés de communiquer avec leurs clients. Cette isolation est garantie par un ensemble de mécanismes à la bordure des réseaux: les mêmes serveurs hébergeant les machines virtuelles doivent notamment déterminer le destinataire approprié pour chaque paquet réseau, copier ces derniers entre zones mémoires isolées et supporter les tunnels permettant l'isolation du trafic lors de son transit sur le coeur de réseau. Ces différentes tâches doivent être accomplies avec aussi peu de ressources matérielles que possible, ces dernières étant tout d'abord destinées aux machines virtuelles. Dans un contexte d'intensification de la demande en haute performance sur les réseaux, les acteurs de l'informatique en nuage ont souvent recours à des équipements matériels spécialisés mais inflexibles, leur permettant d'atteindre les performances requises. Néanmoins, dans cette thèse, nous défendons la possibilité d'améliorer les performances significativement sans avoir recours à de tels équipements. Nous prônons, d'une part, une consolidation des fonctions réseaux au niveau de la couche de virtualisation et, d'autre part, une relocalisation de certaines fonctions réseaux hors des machines virtuelles. À cette fin, nous proposons Oko, un commutateur logiciel extensible qui facilite la consolidation des fonctions réseaux dans la couche de virtualisation. Oko étend les mécanismes de l'état de l'art permettant une mise en cache des règles de commutateurs, ceci afin de permettre une exécution des fonctions réseaux sous forme d'extensions au commutateur. De plus, les extensions sont isolées du coeur du commutateur afin d'empêcher des fautes dans les extensions d'impacter le reste du réseau et de faciliter une mise en place rapide et sûre de nouvelles fonctions réseaux. En permettant aux fonctions réseaux de s'exécuter au sein du commutateur logiciel, sans redirections vers des processus distincts, Oko diminue de moitié le coût lié à l'exécution des fonctions réseaux en moyenne. Notre seconde contribution vise à permettre une exécution de certaines fonctions réseaux en amont des machines virtuelles, au sein de la couche de virtualisation. L'exécution de ces fonctions réseaux hors des machines virtuelles permet d'importants gains de performance, mais lèvent des problématiques d'isolation. Nous réutilisons et améliorons la technique utilisé dans Oko pour isoler les fonctions réseaux et l'étendons avec un mécanisme de partage équitable du temps CPU entre les différentes fonctions réseaux relocalisées. / Multi-tenant networks enable applications from multiple, isolated tenants to communicate over a shared set of underlying hardware resources. The isolation provided by these networks is enforced at the edge: end hosts demultiplex packets to the appropriate virtual machine, copy data across memory isolation boundaries, and encapsulate packets in tunnels to isolate traffic over the datacenter's physical network. Over the last few years, the growing demand for high performance network interfaces has pressured cloud providers to build more efficient multi-tenant networks. While many turn to specialized, hard-to-upgrade hardware devices to achieve high performance, in this thesis, we argue that significant performance improvements are attainable in end-host multi-tenant networks, using commodity hardware. We advocate for a consolidation of network functions on the host and an offload of specific tenant network functions to the host. To that end, we design Oko, an extensible software switch that eases the consolidation of network functions. Oko includes an extended flow caching algorithm to support its runtime extension with limited overhead. Extensions are isolated from the software switch to prevent failures on the path of packets. By avoiding costly redirections to separate processes and virtual machines, Oko halves the running cost of network functions on average. We then design a framework to enable tenants to offload network functions to the host. Executing tenant network functions on the host promises large performance improvements, but raises evident isolation concerns. We extend the technique used in Oko to provide memory isolation and devise a mechanism to fairly share the CPU among offloaded network functions with limited interruptions. Réseau programmable Informatique des nuages Traitement des paquets NFV SDN Programmable network Cloud Packet processing NFV SDN 004.678 2
10	Zpracování paketů pomocí zero copy / Zero Copy Packet Processing Plotěný, Ondřej January 2019 (has links) Cílem této magisterské práce je návrh a implementace síťové sondy pro sledování toků na 10GbE rozhraní. Text se zabývá přehledem GNU/Linux nástrojů využívaných ve vysokorychlostních sítích a principů jejich fungování. Dále pak je uveden návrh a implementace sondy využívající mechanismu zero-copy pro sledování provozu na 10GbE rozhraní. Aplikace využívá Expresní datové cesty (XDP) a jeho AF_XDP soketu pro zachycení provozu na rozhraní. Jako testovací platforma byla vybrána platforma NETX používaná na FIT VUT.

Search results