Global ETD Search

Return to search

Toward Highly-efficient GPU-centric Networking / Mot Högeffektiva GPU-centrerade Nätverk

Graphics Processing Units (GPUs) are emerging as the most popular accelerator for many applications, powering the core of Machine Learning applications and many computing-intensive workloads. GPUs have typically been consideredas accelerators, with Central Processing Units (CPUs) in charge of the mainapplication logic, data movement, and network connectivity. In these architectures,input and output data of network-based GPU-accelerated application typically traverse the CPU, and the Operating System network stack multiple times, getting copied across the system main memory. These increase application latency and require expensive CPU cycles, reducing the power efficiency of systems, and increasing the overall response times. These inefficiencies become of higher importance in latency-bounded deployments, or with high throughput, where copy times could easily inflate the response time of modern GPUs. The main contribution of this dissertation is towards a GPU-centric network architecture, allowing GPUs to initiate network transfers without the intervention of CPUs. We focus on commodity hardware, using NVIDIA GPUs and Remote Direct Memory Access over Converged Ethernet (RoCE) to realize this architecture, removing the need of highly homogeneous clusters and ad-hoc designed network architecture, as it is required by many other similar approaches. By porting some rdma-core posting routines to GPU runtime, we can saturate a 100-Gbps link without any CPU cycle, reducing the overall system response time, while increasing the power efficiency and improving the application throughput.The second contribution concerns the analysis of Clockwork, a State-of-The-Art inference serving system, showing the limitations imposed by controller-centric, CPU-mediated architectures. We then propose an alternative architecture to this system based on an RDMA transport, and we study some performance gains that such a system would introduce. An integral component of an inference system is to account and track user flows,and distribute them across multiple worker nodes. Our third contribution aims to understand the challenges of Connection Tracking applications running at 100Gbps, in the context of a Stateful Load Balancer running on commodity hardware. / <p>QC 20240315</p>

Low-Latency Internet Services

Packet Processing

Network Functions Virtualization

Middle Boxes

Commodity Hardware

Multi-Hundred-Gigabit-Per-Second

Low-Level Optimization

Graphics Processing Units

Inference Serving

Remote Direct Memory Access

Internettjänster med Låg Fördröjning

Paketbearbetning

Virtualisering av Nätverksfunktioner

Mellanutrustning

Tillgänglig Datorhårdvara

Flera-Hundra- Gigabit-Per-Sekund

Lågnivå-Optimering

Grafikprocessor

Inferensserving

Remote Direct Memory Access

Communication Systems

Kommunikationssystem

Computer Systems

Datorsystem

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-344316
Date	January 2024
Creators	Girondi, Massimo
Publisher	KTH, Programvaruteknik och datorsystem, SCS
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Licentiate thesis, monograph, info:eu-repo/semantics/masterThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-AVL ; 2024:30

Page generated in 0.0023 seconds

Toward Highly-efficient GPU-centric Networking / Mot Högeffektiva GPU-centrerade Nätverk

Description

Links & Downloads

Tags

Additional Fields