CouldPin-Fast: Effient and Effective Root Cause Localization for Shared Bandwidth Package Traffic Anomalies in Public Cloud Networks

Published: 01 Jan 2024, Last Modified: 11 Apr 2025IEEE Trans. Serv. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As cloud services become increasingly widespread, many public cloud tenants opt for Shared Bandwidth Package (sBwp) services for inbound/outbound communication. The sBwp service allows tenants to purchase shared bandwidth for multiple virtual machines (VMs) instead of buying it individually, which is a convenient and cost-effective traffic management mode. However, the sBwp service presents new challenges for operators to identify the root cause of abnormal sBwp traffic, especially in large-scale, globally distributed public clouds with millions of users. Developing a localization system in public cloud faces several challenges, including dynamic scalability, hyper-scale data efficiently obtaining, and complex application scenarios. To address these challenges, we propose a two-stage localization method called CloudPin-Fast. First, CloudPin-Fast employs a cold-start mode to meet dynamic requirements. Second, CloudPin-Fast implements a pre-filter to reduce the transmission and processing of hyper-scale data. Finally, CloudPin-Fast uses an anomaly localization algorithm based on multi-dimensional statistics fusion in the second stage to cover complex scenarios. The evaluation results on four production datasets have shown superior efficiency and effectiveness. We also share lessons learned from deploying CloudPin-Fast for over a year in a world-renowned public cloud vendor.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview