Paraleon: Automatic and Adaptive Tuning for DCQCN Parameters in RDMA Networks

Published: 01 Jan 2024, Last Modified: 02 Aug 2025ICNP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: RDMA is a kernel-bypass and transport-offload technology that provides high throughput and low delay for datacenter networks, and DCQCN is the default and most widely used congestion control algorithm in large-scale RDMA networks. DCQCN involves over 10 parameters at RNICs and switches, and their settings significantly affect network performance, currently relying heavily on exhaustive manual tuning. Although some automatic methods are proposed to tune a subset of DCQCN parameters, none of them comprehensively address all parameters at both RNICs and switches, resulting in compromised network performance. In this paper, we propose Paraleon, an automatic and adaptive system to tune DCQCN parameters comprehensively. We design a millisecond-level sketch-based monitoring mechanism for accurate network-wide measurement, which collects runtime metrics as feedback to guide the tuning process. We also analyze the complicated parameter impacts on network performance, and leverage an improved heuristic searching algorithm for timely performance optimization with better efficiency and convergence. We implement Paraleon and conduct extensive experiments in both NS3 simulations and a real-world testbed. The results show that Paraleon achieves $3.8 \% \sim 61.4 \%$ higher performance than existing tuning schemes.
Loading