Abstract: Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (\frac{E}{(1-\gamma)T})$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=jPMJYlJc4j
Changes Since Last Submission: Dear Editor,
We appreciate your efforts in evaluating our work. Below are our responses detailing the improvements we’ve made to the manuscript, which we hope meet the expectations of the reviewing team.
---
To clarify the distinctions between our work and prior research, particularly Woo et al. (2023), we added a comparative table in Section 2 that highlights our focus on heterogeneous environments. Our work quantifies environmental heterogeneity with a parameter, $\kappa$. When $\kappa = 0$, our results align with those of Woo et al. (2023), while for $\kappa > 0$, we present a novel convergence bound and sample complexity. Although our analysis follows a similar approach to Woo et al. (2023), our methodology required adaptations to handle heterogeneous settings without a common optimal action per state. Additionally, we derived a novel lower bound establishing the optimal convergence rate for general Markov Decision Processes.
To further clarify complexity analysis, we introduced Corollary 2, which specifies precise sample complexity results. For homogeneous settings ($\kappa = 0$), our sample complexity matches that of Woo et al. (2023). In heterogeneous settings ($\kappa > 0$), the sample complexity varies depending on environmental heterogeneity, as specified in Corollary 2. We have also added discussions at the end of the related work section and after Lemma 4, where we address technical challenges in adapting Woo et al. (2023)'s methodology.
---
We also revised Theorem 2 to provide explicit numerical constants, addressing feedback on accuracy. The lower bound now explicitly shows constants, helping ensure precision.
---
We have also enriched the experiments, including 1) larger number of agents, 2) convergence performance of time-decaying stepsizes. 3) the effect of $E$ for time-decaying stepsizes, and 4) the final error versus $T$ to show the performance improvement of larger $T$ when stepsize decays with larger $T$.
---
Finally, we carefully reviewed the proofs to address any clarity issues and potential misinterpretations. This included:
- Streamlining Theorem 1 by combining terms and coefficients, as well as introducing Corollary 2 to emphasize sample complexity in different settings as mentioned.
- Adding intermediate steps, particularly in applying Hoeffding’s inequality after Equation (22), where we clarified the independent random variables and their ranges.
These revisions do not alter our main results but aim to improve the clarity and readability of the proofs.
---
The detailed response can be found in the supplementary material where we addressed all the concerns of each reviewer.
Assigned Action Editor: ~Ahmet_Alacaoglu2
Submission Number: 3645
Loading