Robust Multi-Agent Reinforcement Learning with State Uncertainty
Abstract: In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on https://github.com/sihongho/robust_marl_with_state_uncertainty.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: 1. Added Discussions on the proposed robust equilibrium solution concept. 2. Added more explanations to better connect the history-dependent policies section to the whole paper, and make the differences between Markov policies and history-dependent policies more clear. 3. Added discussions on Assumption 4.4 and a proof sketch of Theorem 4.7. 4. Added a remark to describe how the proposed MG-SPA framework adapts to heterogeneous agents and adversaries. 5. Added a remark to discuss a reduced case of MG-SPA where there is only one agent. 6. Added a remark to discuss how MG-SPA benefits from Dec-POMDP and POSG and the differences between them. 7. Added a more general case of policy gradient theorem for MG-SPA, i.e. Theorem 5.3 is modified to stochastic policy gradient. 8. Added more explanations to experiment settings. More experimental results are moved from the appendix to the main text. 9. Added a new section called Discussion to discuss future and potential work in robust MARL with state uncertainty. 10. Modified some typos according to reviewers' comments.
Assigned Action Editor: ~Marc_Lanctot1
Submission Number: 901