Global ETD Search

1	Diffuser: Packet Spraying While Maintaining Order : Distributed Event Scheduler for Maintaining Packet Order while Packet Spraying in DPDK / Diffusor: Packet Spraying While Upprätthålla Ordning : Distribuerad händelseschemaläggare för att upprätthålla paketordning medan Paketsprutning i DPDK Purushotham Srinivas, Vignesh January 2023 (has links) The demand for high-speed networking applications has made Network Processors (NPs) and Central Computing Units (CPUs) increasingly parallel and complex, containing numerous on-chip processing cores. This parallelism can only be exploited fully by the underlying packet scheduler by efficiently utilizing all the available cores. Classically, packets have been directed towards the processing cores at flow granularity, making them susceptible to traffic locality. Ensuring a good load balance among the processors improves the application’s throughput and packet loss characteristics. Hence, packet-level schedulers dispatch flows to the processing core at a packet granularity to improve the load balance. However, packet-level scheduling combined with advanced parallelism introduces out-of-order departure of the processed packets. Simultaneously optimizing both the load balance and packet order is challenging. In this degree project, we micro-benchmark the DPDK’s (Dataplane Development Kit) event scheduler and identify many performance and scalability bottlenecks. We find the event scheduler consumes around 40% of the cycles on each participating core for event scheduling. Additionally, we find that DSW (Distributed Software Scheduler) cannot saturate all the workers with traffic because a single NIC (Network Interface Card) queue is polled for packets in our test setup. Then we propose Diffuser, an event scheduler for DPDK that combines the functional properties of both the flow and packet-level schedulers. The diffuser aims to achieve optimal load balance while minimizing out-of-order packet transmission. Diffuser uses stochastic flow assignments along with a load imbalance feedback mechanism to adaptively control the rate of flow migrations to optimize the scheduler’s load distribution. Diffuser reduces packet reordering by at least 65% with ten flows of 100 bytes at 25 MPPS (Million Packet Per Second) and at least 50% with one flow. While Diffuser improves the reordering performance, it slightly reduces throughput and increases latency due to flow migrations and reduced cache locality / Efterfrågan på höghastighets-nätverksapplikationer har gjort nätverkspro-cessorer (NP) och centrala beräkningsenheter (CPU:er) alltmer parallella, komplexa och innehållande många processorkärnor. Denna parallellitet kan endast utnyttjas fullt ut av den underliggande paketschemaläggaren genom att effektivt utnyttja alla tillgängliga kärnor. Vanligtvis har paketschemaläggaren skickat paket till olika kärnor baserat på flödesgranularitet, vilket medför trafik-lokalitet. En bra belastningsbalans mellan processorerna förbättrar applikationens genomströmning och minskar förlorade paket. Därför skickar schemaläggare på paketnivå istället flöden till kärnan med en paketgranularitet för att förbättra lastbalansen. Schemaläggning på paketnivå kombinerat med avancerad parallellism innebär dock att de behandlade paketen avgår i oordning. Att samtidigt optimera både lastbalans och paketordning är en utmaning. I detta examensprojekt utvärderar vi DPDKs (Dataplane Development Kit) händelseschemaläggare och hittar många flaskhalsar i prestanda och skalbarhet. Vi finner att händelseschemaläggaren konsume-rar cirka 40 % av cyklerna på varje kärna.Dessutom finner vi att DSW (Schemaläggare för distribuerad programvara) inte kan mätta alla arbetande kärnor med trafik eftersom en enda nätverkskorts-kö används i vår testmiljö. Vi introducerar också Diffuser, en händelse-schemaläggare för DPDK som kombinerar egenskaperna hos både flödes-och paketnivåschemaläggare. Diffuser ämnar att uppnå optimal lastbalans samtidigt som den minimerar paketöverföring i oordning. Den använder stokastiska flödestilldelningar tillsammans med en återkopplingsmekanism för lastobalans för att adaptivt kontrollera flödesmigreringar för att optimera lastfördelningen. Diffuser minskar omordning av paket med minst 65 % med tio flöden på 100 byte vid 25 MPPS (Miljoner paket per sekund) och minst 50 % med endast ett flöde. Även om Diffuser förbättrar omordningsprestandan, minskar den genomströmningen något och ökar latensen på grund av flödesmigreringar och minskad cache-lokalitet. Packet scheduling Scheduling Out of order Data plane development kit Parallel processing Network processor Paketschemaläggning Schemaläggning oordning Dataplansutvecklingskit Parallell bearbetning Nätverksprocessor Computer and Information Sciences Data- och informationsvetenskap
2	Low-power Implementation of Neural Network Extension for RISC-V CPU / Lågeffektimplementering av neural nätverksutvidgning för RISC-V CPU Lo Presti Costantino, Dario January 2023 (has links) Deep Learning and Neural Networks have been studied and developed for many years as of today, but there is still a great need of research on this field, because the industry needs are rapidly changing. The new challenge in this field is called edge inference and it is the deployment of Deep Learning on small, simple and cheap devices, such as low-power microcontrollers. At the same time, also on the field of hardware design the industry is moving towards the RISC-V micro-architecture, which is open-source and is developing at such a fast rate that it will soon become the standard. A batteryless ultra low power microcontroller based on energy harvesting and RISC-V microarchitecture has been the final target device of this thesis. The challenge on which this project is based is to make a simple Neural Network work on this chip, i.e., finding out the capabilities and the limits of this chip for such an application and trying to optimize as much as possible the power and energy consumption. To do that TensorFlow Lite Micro has been chosen as the Deep Learning framework of reference, and a simple existing application was studied and tested first on the SparkFun Edge board and then successfully ported to the RISC-V ONiO.zero core, with its restrictive features. The optimizations have been done only on the convolutional layer of the neural network, both by Software, implementing the Im2col algorithm, and by Hardware, designing and implementing a new RISC-V instruction and the corresponding Hardware unit that performs four 8-bit parallel multiply-and-accumulate operations. This new design drastically reduces both the inference time (3.7 times reduction) and the number of instructions executed (4.8 times reduction), meaning lower overall power consumption. This kind of application on this type of chip can open the doors to a whole new market, giving the possibility to have thousands small, cheap and self-sufficient chips deploying Deep Learning applications to solve simple everyday life problems, even without network connection and without any privacy issue. / Deep Learning och neurala nätverk har studerats och utvecklats i många år fram till idag, men det finns fortfarande ett stort behov av forskning på detta område, eftersom industrins behov förändras snabbt. Den nya utmaningen inom detta område kallas edge inferens och det är implementeringen av Deep Learning på små, enkla och billiga enheter, såsom lågeffektmikrokontroller. Samtidigt, även på området hårdvarudesign, går industrin mot RISC-V-mikroarkitekturen, som är öppen källkod och utvecklas i så snabb takt att den snart kommer att bli standarden. En batterilös mikrokontroller med ultralåg effekt baserad på energiinsamling och RISC-V-mikroarkitektur har varit den slutliga målenheten för denna avhandling. Utmaningen som detta projekt är baserat på är att få ett enkelt neuralt nätverk att fungera på detta chip, det vill säga att ta reda på funktionerna och gränserna för detta chip för en sådan applikation och försöka optimera så mycket som möjligt ström- och energiförbrukningen. För att göra det har TensorFlow Lite Micro valts som referensram för Deep Learning, och en enkel befintlig applikation studerades och testades först på SparkFun Edge-kortet och portades sedan framgångsrikt till RISC-V ONiO.zero-kärnan, med dess restriktiva funktioner. Optimeringarna har endast gjorts på det konvolutionerande skikt av det neurala nätverket, både av mjukvara, implementering av Im2col-algoritmen, och av hårdvara, design och implementering av en ny RISC-V-instruktion och motsvarande hårdvaruenhet som utför fyra 8-bitars parallella multiplikation -och-ackumulationsoperationer. Denna nya design minskar drastiskt både slutledningstiden (3,7 gånger kortare) och antalet utförda instruktioner (4.8 gånger färre), vilket innebär lägre total strömförbrukning. Den här typen av applikationer på den här typen av chip kan öppna dörrarna till en helt ny marknad, vilket ger möjlighet att ha tusentals små, billiga och självförsörjande chip som distribuerar Deep Learning-applikationer för att lösa enkla vardagsproblem, även utan nätverksanslutning och utan någon integritetsproblematik. Artificial intelligence Deep learning Neural networks Edge computing Convolutional neural networks Low-power electronics RISC-V AI accelerators Parallel processing Artificiell intelligens Deep learning Neurala nätverk Edge computing konvolutionella neurala nätverk Lågeffektelektronik RISC-V AI-acceleratorer Parallell bearbetning Elektroteknik och elektronik

Search results

Diffuser: Packet Spraying While Maintaining Order : Distributed Event Scheduler for Maintaining Packet Order while Packet Spraying in DPDK / Diffusor: Packet Spraying While Upprätthålla Ordning : Distribuerad händelseschemaläggare för att upprätthålla paketordning medan Paketsprutning i DPDK

Low-power Implementation of Neural Network Extension for RISC-V CPU / Lågeffektimplementering av neural nätverksutvidgning för RISC-V CPU