Adaptive Federated Q-Learning with Importance Averaging: Near-Optimal Sample Complexity and $K$-Independent Communication
Keywords: Federated Reinforcement Learning
Abstract: We revisit federated tabular Q-learning with $K$ decentralized agents that interact with a common MDP under heterogeneous behavior policies and periodically synchronize with a server. We analyze a simple, practical scheme: local asynchronous Q-learning with \emph{importance averaging} at synchronization and an \emph{adaptive doubling} communication schedule. Counting \emph{total} environment steps across all agents, we show that the sample complexity matches a centralized learner up to logarithmic factors and depends on the minimum entry of the \emph{average} stationary occupancy, not the worst single agent.The number of synchronization rounds is $\tilde{\mathcal{O}}\big((1-\gamma)^{-1}\log(1/\varepsilon)\big)$, independent of $K$. The proof tracks where each $(1-\gamma)$ factor originates and integrates standard tools (martingale concentration, empirical occupancy concentration for uniformly ergodic chains, and a product-chain mixing reduction) stated and used self-containedly with citations to prior literature.
Supplementary Material: zip
Submission Number: 317
Loading