Finite-Time Analysis of Federated Temporal Difference Learning with Linear Function Approximation under Environment and Computation Heterogeneity

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, temporal difference learning, linear function approximation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Temporal difference (TD) learning is a popular method for reinforcement learning (RL). In this paper, we study federated TD learning with linear function approximation, where multiple agents collaboratively perform policy evaluation via TD learning, while interacting with heterogeneous environments and using heterogeneous computation configurations. We devise a Heterogeneous Federated TD (HFTD) algorithm which iteratively aggregates agents' local stochastic gradients for TD learning. The HFTD algorithm involves two major novel elements: 1) it aims to find the optimal value function model for the mixture environment averaged over agents' heterogeneous environments, using local stochastic gradients of agent's mean squared projected Bellman errors (MSPBEs) for their respective environments; 2) it allows agents to perform difference numbers of local iterations for TD learning. We analyze the finite-time convergence performance of the HFTD algorithm for the settings of I.I.D. sampling and Markovian sampling by characterizing bounds on the convergence error. Our results show that the HFTD algorithm can asymptotically converge to the optimal model, which is the first such result in existing works on federated RL (to our best knowledge). The HFTD algorithm also achieves sample complexity of $O\left( {\frac{1}{\varepsilon }\log \frac{1}{\varepsilon }} \right)$ and linear convergence speedup, which match the results of existing TD algorithms.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4588
Loading