Return to search

Optically-Enabled High Performance Reconfigurable Interconnection Networks

The influx of new data-intensive applications, such as machine learning and artificial intelligence, in high performance computing (HPC) and data centers (DC), has driven the design of efficient interconnection networks to meet the requisite bandwidth of the growing traffic demand. While the exponentially-growing traffic demand is expected to continue into the future, the free scaling of CMOS-based electrical interconnection networks will eventually taper off due to Moore’s Law. These trends suggest that building all-electrical interconnects to meet the increased demand for low latency, high throughput networking will become increasingly impractical going forward. Integrating optical interconnects capable of supporting high bandwidth links and dynamic network topology reconfiguration offer a potential solution to scaling current networks. However, the insertion of photonic interconnection networks offers a massive design space in terms of network topology and control plane that is currently under-explored. The work in this dissertation is centered around the study and development of control plane challenges to aid in the eventual adoption of optically-enabled reconfigurable networks.

We begin by exploring Flexspander, a novel reconfigurable network topology that combines the flexible random expander networks construction with topological-reconfigurability using optical circuit switching (OCS). By incorporating random expander graph construction, as opposed to other more symmetric reconfigurable topologies, Flexspander can be built with a broader range of electrical packet switch (EPS) radix, while retaining high throughput and low latency when coupled with multi-path routing.

In addition, we propose a topology-routing co-optimization scheme to improve network robustness under traffic uncertainties. Our proposed scheme employs a two-step strategy: First, we optimize the topology and routing strategy by maximizing throughput and average packet hop count for the expected traffic patterns based on historical traffic patterns. Second, we employ a desensitization step on top of the topology and routing solution to lower performance degradation due to traffic variations. We demonstrate the effectiveness of our approach using production traces from Facebook's Altoona data center, and show that even with infrequent reconfigurations, our solution can attain performances within 15\% of an offline optimal oracle.

Next, we study the problem of routing scheme design in reconfigurable networks, which is a more under-studied problem compared to routing design for static networks. We first perform theoretical analyses to first identify the key properties an effective routing protocol for reconfigurable networks should possess. Using findings from these theoretical analyses, we propose a lightweight but effective routing scheme that yields high performance for practical HPC and DC workloads when employed with reconfigurable networks.

Finally, we explore two fundamental design problems in the optical reconfigurable network design. First, it investigates how different OCS placement in the physical network topology lead to different tradeoffs in terms of power consumption/cost, network performance, and scalability. Second, we investigate how network performance is affected by different reconfiguration periods to understand how frequency of topology reconfiguration affects application performance.

Taken together, the work in this dissertation tackles several key challenges related to efficient control plane for reconfigurable network designs, with the goal of facilitating the eventual adoption of optically-enable reconfigurable networks in high performance systems.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-bez0-0f86
Date January 2022
CreatorsTeh, Min Yee
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0015 seconds