Spelling suggestions: "subject:"Low-Level aptimization"" "subject:"Low-Level anoptimization""
1 |
Realizing Low-Latency Internet Services via Low-Level Optimization of NFV Service Chains : Every nanosecond counts!Farshin, Alireza January 2019 (has links)
By virtue of the recent technological developments in cloud computing, more applications are deployed in a cloud. Among these modern cloud-based applications, some require bounded and predictable low-latency responses. However, the current cloud infrastructure is unsuitable as it cannot satisfy these requirements, due to many limitations in both hardware and software. This licentiate thesis describes attempts to reduce the latency of Internet services by carefully studying the currently available infrastructure, optimizing it, and improving its performance. The focus is to optimize the performance of network functions deployed on commodity hardware, known as network function virtualization (NFV). The performance of NFV is one of the major sources of latency for Internet services. The first contribution is related to optimizing the software. This project began by investigating the possibility of superoptimizing virtualized network functions(VNFs). This began with a literature review of available superoptimization techniques, then one of the state-of-the-art superoptimization tools was selected to analyze the crucial metrics affecting application performance. The result of our analysis demonstrated that having better cache metrics could potentially improve the performance of all applications. The second contribution of this thesis employs the results of the first part by taking a step toward optimizing cache performance of time-critical NFV service chains. By doing so, we reduced the tail latencies of such systems running at 100Gbps. This is an important achievement as it increases the probability of realizing bounded and predictable latency for Internet services. / Tack vare den senaste tekniska utvecklingen inom beräkningar i molnet(“cloud computing”) används allt fler tillämpningar i molnlösningar. Flera avdessa moderna molnbaserade tillämpningar kräver korta svarstider är låga ochatt dessa ska vara förutsägbara och ligga inom givna gränser. Den nuvarandemolninfrastrukturen är dock otillräcklig eftersom den inte kan uppfylla dessa krav,på grund av olika typer av begränsningar i både hårdvara och mjukvara. I denna licentiatavhandling beskrivs försök att minska fördröjningen iinternettjänster genom att noggrant studera den nuvarande tillgängligainfrastrukturen, optimera den och förbättra dess prestanda. Fokus ligger påatt optimera prestanda för nätverksfunktioner som realiseras med hjälp avstandardhårdvara, känt som nätverksfunktionsvirtualisering (NFV). Prestanda hosNFV är en av de viktigaste källorna till fördröjning i internettjänster. Det första bidraget är relaterat till att optimera mjukvaran. Detta projektbörjade med att undersöka möjligheten att “superoptimera” virtualiseradenätverksfunktioner (VNF). Detta inleddes med en litteraturöversikt av tillgängligasuperoptimeringstekniker, och sedan valdes ett av de toppmodernasuperoptimeringsverktygen för att analysera de viktiga mätvärden som påverkartillämpningssprestanda. Resultatet av vår analys visade att bättre cache-mätningar potentiellt skulle kunna förbättra prestanda för alla tillämpningar. Det andra bidraget i denna avhandling utnyttjar resultaten från den förstadelen genom att ta ett steg mot att optimera cache-prestanda för tidskritiskakedjor av NFV-tjänster. Genom att göra så reducerade vi de långa fördröjningarnahos sådana system som kördes vid 100 Gbps. Detta är en viktig bedrift eftersomdetta ökar sannolikheten för att uppnå en begränsad och förutsägbar fördrörninghos internettjänster. / <p>QC 20190415</p> / Time-Critical Clouds / ULTRA
|
2 |
Toward Highly-efficient GPU-centric Networking / Mot Högeffektiva GPU-centrerade NätverkGirondi, Massimo January 2024 (has links)
Graphics Processing Units (GPUs) are emerging as the most popular accelerator for many applications, powering the core of Machine Learning applications and many computing-intensive workloads. GPUs have typically been consideredas accelerators, with Central Processing Units (CPUs) in charge of the mainapplication logic, data movement, and network connectivity. In these architectures,input and output data of network-based GPU-accelerated application typically traverse the CPU, and the Operating System network stack multiple times, getting copied across the system main memory. These increase application latency and require expensive CPU cycles, reducing the power efficiency of systems, and increasing the overall response times. These inefficiencies become of higher importance in latency-bounded deployments, or with high throughput, where copy times could easily inflate the response time of modern GPUs. The main contribution of this dissertation is towards a GPU-centric network architecture, allowing GPUs to initiate network transfers without the intervention of CPUs. We focus on commodity hardware, using NVIDIA GPUs and Remote Direct Memory Access over Converged Ethernet (RoCE) to realize this architecture, removing the need of highly homogeneous clusters and ad-hoc designed network architecture, as it is required by many other similar approaches. By porting some rdma-core posting routines to GPU runtime, we can saturate a 100-Gbps link without any CPU cycle, reducing the overall system response time, while increasing the power efficiency and improving the application throughput.The second contribution concerns the analysis of Clockwork, a State-of-The-Art inference serving system, showing the limitations imposed by controller-centric, CPU-mediated architectures. We then propose an alternative architecture to this system based on an RDMA transport, and we study some performance gains that such a system would introduce. An integral component of an inference system is to account and track user flows,and distribute them across multiple worker nodes. Our third contribution aims to understand the challenges of Connection Tracking applications running at 100Gbps, in the context of a Stateful Load Balancer running on commodity hardware. / <p>QC 20240315</p>
|
Page generated in 0.437 seconds