EMPRN: Reinforcement Learning-based ECN Tuning Using Message Passing Graph Recurrent Networks for Datacenters

Kasra Zaeri, Ahmed M. Abdelmoniem

Published: 2024, Last Modified: 06 Feb 2025ICC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Congestion control (CC) based on explicit congestion notification (ECN) is a common method for reducing latency and increasing link utilization in data center networks (DCN). Proper ECN tuning significantly impacts the performance of ECN-based CC algorithms. Due to the fast buffer buildup and dynamic spatial-temporal nature of traffic in high-speed DCNs, fast and online ECN tuning can reduce latency and packet loss. Most existing approaches do not capture the spatial dependencies between egress ports of switches. In this paper, we propose EMPRN, a novel in-network CC algorithm based on multi-agent reinforcement learning (MARL). We design a graph recurrent neural network for online ECN tuning. EMPRN is implemented in a distributed manner on switches, and it can be adapted to most ECN-based CC protocols. We use a message-passing neural network (MPNN) architecture to capture the spatial dependencies between egress ports. We integrate the proposed MPNN with a gated recurrent unit (GRU) network to learn both the spatial and temporal dependencies. Our simulation results show that our proposed approach achieves up to 21% and 87.9% reductions in terms of flow completion time (FCT) and average queue length, respectively, compared to a state-of-the-art reinforcement learning-based approach for online ECN tuning.