On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We give global convergence rates for vanilla and entropy-regularised federated softmax policy gradient and analyse the impact of heterogenity in federated reinforcement learning.
Abstract: We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient ($\texttt{FedPG}$) with local training. We show that $\texttt{FedPG}$ converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient *with explicit constants*, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the Łojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic policies.
Submission Number: 2252
Loading