Cooperative TD Learning in Heterogeneous Environments via Joint Linear Approximation

Cooperative TD Learning in Heterogeneous Environments via Joint Linear Approximation

ICLR 2026 Conference Submission20956 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Reinforcement Learning, Representation Learning

Abstract: We study cooperative temporal-difference (TD) learning with heterogeneous agents, where a collection of agents interacts with heterogeneous environments, yet tries to accelerate learning via seeking collaboration with other agents. We focus on the setting where there exists a shared linear representation and the agents' optimal weights collectively lie in an $r$-dimensional linear subspace. Intuitively, the smaller the $r$, the greater the potential benefits of the collaboration. However, heterogeneity in the agents' state transition kernels can lead to misaligned learning signals across agents, which may significantly hinder convergence and impair the generalization of the learned policies. In this paper, inspired by the recent success of personalized federated learning (PFL), we study the convergence of federated single-timescale TD in which agents iteratively estimate the common subspace and local heads. We showed that this decomposition can filter out conflicting signals while amplifying shared structures, effectively balancing the above two competing intuitions. In the analysis, the Markovian sample of TD error makes it difficult to obtain a direct contraction for the principal angle distance between the optimal subspace and the estimated subspace, but only an indirect subtraction in terms of the local weights error. To address this challenge, we found that when the principal angle distance is positive, the local weights error will always be positive as well. Following this line of thought, we further discovered that the local weights error can be lower-bounded by the principle angle distance times a constant depends on the diversity of the optimal local weights.

Primary Area: reinforcement learning

Submission Number: 20956

Loading