Spelling suggestions: "subject:"datacenter network"" "subject:"datacenters network""
1 |
Solving Practical Problems in Datacenter NetworksWu, Xin January 2013 (has links)
<p>The soaring demands for always-on and fast-response online services have driven modern datacenter networks to undergo tremendous growth. These networks often rely on scale-out designs with large numbers of commodity switches to reach immense capacity while keeping capital expenses under check. Today, datacenter network operators spend tremendous time and efforts on two key challenges: 1) how to efficiently utilize the bandwidth connecting host pairs and 2) how to promptly handle network failures with minimal disruptions to the hosted services.</p><p>To resolve the first challenge, we propose solutions in both network layer and transport layer. In the network layer solution, We advocate to design practical datacenter architectures for easy operation, i.e., an architecture should be reliable, capable of improving bisection bandwidth, scalable and debugging-friendly. By strictly following these four guidelines, We propose DARD, a Distributed Adaptive Routing architecture for Datacenter networks. DARD allows each end host to reallocate traffic from overloaded paths to underloaded paths without central coordination. We use congestion game theory to show that DARD converges to a Nash equilibrium in finite steps and its gap to the optimal flow allocation is bounded in the order of 1/logL, with L being the number of links. We use a testbed implementation and simulations to show that DARD can achieve a close-to-optimal flow allocation with small control overhead in practice.</p><p>In the transport layer solution, We propose Explicit Multipath Congestion Control Protocol (MPXCP), which achieves four desirable properties: fast convergence, efficiency, being fair to flows with different RTTs and negligible queue size. Intensive ns-2 simulation shows that MPXCP can quickly converge to efficiency and fairness without building up queues despite different delay-bandwidth products.</p><p>To resolve the second challenge, recent research efforts have focused on automatic failure localization. Yet, resolving failures still requires significant human interventions, resulting in prolonged failure recovery time. Unlike previous work, we propose NetPilot, a system aims to quickly mitigate rather than resolve failures. NetPilot mitigates failures in much the same way operators do -- by deactivating or restarting suspected offending components. NetPilot circumvents the need for knowing the exact root cause of a failure by taking an intelligent trial-and-error approach. The core of NetPilot is comprised of an Impact Estimator that helps guard against overly disruptive mitigation actions and a failure-specific mitigation planner that minimizes the number of trials. We demonstrate that NetPilot can effectively mitigate several types of critical failures commonly encountered in production datacenter networks.</p> / Dissertation
|
2 |
BUILDING FAST, SCALABLE, LOW-COST, AND SAFE RDMA SYSTEMS IN DATACENTERSShin-yeh Tsai (7027667) 16 October 2019 (has links)
<div>Remote Direct Memory Access, or RDMA, is a technology that allows one computer server to direct access the memory of another server without involving its CPU. Compared with traditional network technologies, RDMA offers several benefits including low latency, high throughput, and low CPU utilization. These features are especially attractive to datacenters, and because of this, datacenters have started to adopt RDMA in production scale in recent years.</div><div>However, RDMA was designed for confined, single-tenant, High-Performance-Computing (HPC) environments. Many of its design choices do not fit datacenters well, and it cannot be readily used by datacenter applications. To use RDMA, current datacenter applications have to build customized software stacks and fine-tune their performance. In addition, RDMA offers limited scalability and does not have good support for resource sharing or protection across different applications.</div><div>This dissertation sets out to seek solutions that can solve issues of RDMA in a systematic way and makes it more suitable for a wide range of datacenter applications.</div><div>Our first task is to make RDMA more scalable, easier to use, and have better support for safe resource sharing in datacenters. For this purpose, we propose to add an indirection layer on top of native RDMA to virtualize its low-level abstraction into a high-level one. This indirection layer safely manages RDMA resources for different datacenter applications and also provide a means for better scalability.</div><div>After making RDMA more suitable for datacenter environments, our next task is to build applications that can exploit all the benefits from (our improved) RDMA. We designed a set of systems that store data in remote persistent memory and let client machines access these data through pure one-sided RDMA communication. These systems lower monetary and energy cost compared to traditional datacenter data stores (because no processor is needed at remote persistent memory), while achieving good performance and reliability.</div><div>Our final task focuses on a completely different and so far largely overlooked one — security implications of RDMA. We discovered several key vulnerabilities in the one-sided communication pattern and in RDMA hardware. We exploited one of them to create a novel set of remote side-channel attacks, which we are able to launch on a widely used RDMA system with real RDMA hardware.</div><div>This dissertation is one of the initial efforts in making RDMA more suitable for datacenter environments from scalability, usability, cost, and security aspects. We hope that the systems we built as well as the lessons we learned can be helpful to future networking and systems researchers and practitioners.</div>
|
Page generated in 0.0677 seconds