Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RLDownload PDF

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: off-line multi-agent RL, mean-field approximation, pessimistic value iteration, sub-optimality bound
Abstract: Mean-Field Multi-Agent Reinforcement Learning (MF-MARL) is attractive in the applications involving a large population of homogeneous agents, as it exploits the permutation invariance of agents and avoids the curse of many agents. Most existing results only focus on online settings, in which agents can interact with the environment during training. In some applications such as social welfare optimization, however, the interaction during training can be prohibitive or even unethical in the societal systems. To bridge such a gap, we propose a SAFARI (peSsimistic meAn-Field vAlue iteRatIon) algorithm for off-line MF-MARL, which only requires a handful of pre-collected experience data. Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents. Numerical experiments are provided.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
15 Replies