DUST: Resource-Aware Telemetry Offloading with A Distributed Hardware-Agnostic Approach

Published: 01 Jan 2024, Last Modified: 18 Jul 2025IPDPS (Workshops) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In-device network monitoring has emerged as a promising alternative to centralized telemetry for gaining insights into the status and behavior of network devices. Despite its advantages of providing in-depth device telemetry and predicting failures in advance, it can impose substantial computational and storage burdens, potentially hindering networking devices' core switching and bridging functions. In light of this challenge, DUST system is introduced to dynamically distribute and offload in-device monitoring tasks by harnessing the available computational resources across network nodes. It is designed to be hardware-agnostic, making it deployable on switches, servers, DPUs, SmartNICs, and other relevant devices. Our initial experiments on a real data center testbed indicate that DUST can reduce CPU utilization by up to 50% and memory usage by up to 15% in the context of in-device monitoring workloads. We present a comprehensive system architecture that encompasses various nodes and discuss the flow of packets and message communications. To tackle one of the primary challenges posed by DUST-namely, the optimal relocation of computations while considering network performance constraints and controllable routing decisions-we mathematically formulate the problem as an Integer Linear Program (ILP), along with a heuristic algorithm to reduce the computational complexity. We thoroughly examine the effectiveness and scalability aspects of our algorithms by considering various network sizes and use cases.
Loading