Return to search

Punching Holes in the Cloud: Direct Communication between Serverless Functions Using NAT Traversal

A growing use for serverless computing is large parallel data processing applications that take advantage of its on-demand scalability. Because individual serverless compute nodes, which are called functions, run in isolated containers, a major challenge with this paradigm is transferring temporary computation data between functions. Previous works have performed inter-function communication using object storage, which is slow, or in-memory databases, which are expensive. We evaluate the use of direct network connections between functions to overcome these limitations. Although function containers block incoming connections, we are able to bypass this restriction using standard NAT traversal techniques. By using an external server, we implement TCP hole punching to establish direct TCP connections between functions. In addition, we develop a communications framework to manage NAT traversal and data flow for applications using direct network connections. We evaluate this framework with a reduce-by-key application compared to an equivalent version that uses object storage for communication. For a job with 100+ functions, our TCP implementation runs 4.7 times faster at almost half the cost. / Master of Science / Serverless computing is a branch of cloud computing where users can remotely run small programs, called "functions," and pay only based on how long they run. A growing use for serverless computing is running large data processing applications that use many of these serverless functions at once, taking advantage of the fact that serverless programs can be started quickly and on-demand. Because serverless functions run on isolated networks from each other and can only make outbound connections to the public internet, a major challenge with this paradigm is transferring temporary computation data between functions. Previous works have used separate types of cloud storage services in combination with serverless computing to allow functions to exchange data. However, hard-drive--based storage is slow and memory-based storage is expensive. We evaluate the use of direct network connections between functions to overcome these limitations. Although functions cannot receive incoming network connections, we are able to bypass this restriction by using a standard networking technique called Network Address Translation (NAT) traversal. We use an external server as an initial relay to setup a network connection between two functions such that once the connection is established, the functions can communicate directly with each other without using the server anymore. In addition, we develop a communications framework to manage NAT traversal and data flow for applications using direct network connections. We evaluate this framework with an application for combining matching data entries and compare it to an equivalent version that uses storage based on hard drives for communication. For a job with over 100 functions, our implementation using direct network connections runs 4.7 times faster at almost half the cost.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/103627
Date04 June 2021
CreatorsMoyer, Daniel William
ContributorsComputer Science, Nikolopoulos, Dimitrios S., Back, Godmar V., Butt, Ali
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0022 seconds