Building a Digital Twin for network optimization using Graph Neural Networks

Published: 01 Jan 2022, Last Modified: 06 Feb 2025Comput. Networks 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Network modeling is a critical component of Quality of Service (QoS) optimization. Current networks implement Service Level Agreements (SLA) by careful configuration of both routing and queue scheduling policies. However, existing modeling techniques are not able to produce accurate estimates of relevant SLA metrics, such as delay or jitter, in networks with complex QoS-aware queueing policies (e.g., strict priority, Weighted Fair Queueing, Deficit Round Robin). Recently, Graph Neural Networks (GNNs) have become a powerful tool to model networks since they are specifically designed to work with graph-structured data. In this paper, we propose a GNN-based network model able to understand the complex relationship between (i)<math><mrow is="true"><mo is="true">(</mo><mi is="true">i</mi><mo is="true">)</mo></mrow></math> the queueing policy (scheduling algorithm and queue sizes), (ii)<math><mrow is="true"><mo is="true">(</mo><mi is="true">i</mi><mi is="true">i</mi><mo is="true">)</mo></mrow></math> the network topology, (iii)<math><mrow is="true"><mo is="true">(</mo><mi is="true">i</mi><mi is="true">i</mi><mi is="true">i</mi><mo is="true">)</mo></mrow></math> the routing configuration, and (iv)<math><mrow is="true"><mo is="true">(</mo><mi is="true">i</mi><mi is="true">v</mi><mo is="true">)</mo></mrow></math> the input traffic matrix. We call our model TwinNet, a Digital Twin that can accurately estimate relevant SLA metrics for network optimization. TwinNet can generalize to its input parameters, operating successfully in topologies, routing, and queueing configurations never seen during training. We evaluate TwinNet over a wide variety of scenarios with synthetic traffic and validate it with real traffic traces. Our results show that TwinNet can provide accurate estimates of end-to-end path delays in 106 unseen real-world topologies, under different queuing configurations with a Mean Absolute Percentage Error (MAPE) of 3.8%, as well as a MAPE of 6.3% error when evaluated with a real testbed. We also showcase the potential of the proposed model for SLA-driven network optimization and what-if analysis.
Loading