Cloudy Forecast: How Predictable is Communication Latency in the Cloud?

JSYS 2023 Aug Papers Submission6 Authors

30 Jul 2023 (modified: 16 Aug 2023)JSYS 2023 Aug Papers SubmissionEveryoneRevisions
Keywords: Cloud latency, network, round-trip latency
TL;DR: This paper presents an empirical evaluation and observation study of communication latency and variability in the public cloud and draws several important lessons that can be helpful to engineers designing latency-sensitive systems for the cloud.
Abstract: Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the system to enter a less-optimized execution mode. Unfortunately, these timing assumptions are often set with little regard to actual communication guarantees of the underlying infrastructure -- in particular, the variability of communication delays between components. The higher communication variability holds especially true for systems deployed in the public cloud since the cloud is a utility shared by many users and organizations, making it prone to higher performance variance due to noisy neighbor syndrome. In this work, we present StormCloud, a simple tool that can help measure the variability of communication delays between nodes to help engineers set proper values for their timing assumptions. We also provide our observational analysis of running StormCloud in three major cloud providers and share the lessons we learned.
Area: Networking
Type: Systemization of Knowledge (SoK)
Revision: No
Submission Number: 6
Loading