On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Muxing Wang; Pengkun Yang; Lili Su

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Muxing Wang, Pengkun Yang, Lili Su

Published: 28 Aug 2025, Last Modified: 28 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We provide a fine-grained characterization of the error evolution, which decays to zero as the number of iterations $T$ increases. When $K(E-1)$ is below a certain threshold, similar to the homogeneous environment settings, there is a linear speed-up concerning $K$. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta_R (\frac{E}{(1-\gamma)T})$, where $\Theta_R$ only hides numerical constants and the specific choice of reward values. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=boB70uLhFC

Changes Since Last Submission: Dear Editor, We appreciate your efforts in evaluating our work. Below are the bullet point of the improvements we’ve made to the manuscript, which we hope to meet the expectations of the reviewing team. The detailed response can be found in the supplementary material where we addressed all the concerns of each reviewer. We have + unified the asymptotic notation + improved the figures for clarity + modified Remarks regarding linear speedup + added an explanation of the two-phase phenomenon + added a newest relevant paper to related work section + added a Remark talking about the communication cost + added the choice of stepsizes in Corollary 2 + replaced r by z in the proof sketch of Theorem2

Assigned Action Editor: ~Ahmet_Alacaoglu2

Submission Number: 4485

Loading