HyperDrive: Direct Network Telemetry Storage via Programmable Switches

Published: 01 Jan 2025, Last Modified: 01 Aug 2025IEEE Trans. Cloud Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In cloud datacenter operations, telemetry and logs are indispensable, enabling essential services such as network diagnostics, auditing, and knowledge discovery. The escalating scale of data centers, coupled with increased bandwidth and finer-grained telemetry, results in an overwhelming volume of data. This proliferation poses significant storage challenges for telemetry systems. In this article, we introduce HyperDrive, an innovative system designed to efficiently store large volumes of telemetry and logs in data centers using programmable switches. This in-network approach effectively mitigates bandwidth bottlenecks commonly associated with traditional endpoint-based methods. To our knowledge, we are the first to use a programmable switch to directly control storage, bypassing the CPU to achieve the best performance. With merely 21% of a switch’s resources, our HyperDrive implementation showcases remarkable scalability and efficiency. Through rigorous evaluation, it has demonstrated linear scaling capabilities, efficiently managing 12 SSDs on a single server with minimal host overhead. In an eight-server testbed, HyperDrive achieved an impressive throughput of approximately 730 Gbps, underscoring its potential to transform data center telemetry and logging practices.
Loading