ERMAS: Learning Policies Robust to Reality Gaps in Multi-Agent SimulationsDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Robustness, Multi-Agent Learning, Sim2Real, Reinforcement Learning
Abstract: Policies for real-world multi-agent problems, such as optimal taxation, can be learned in multi-agent simulations with AI agents that emulate humans. However, simulations can suffer from reality gaps as humans often act suboptimally or optimize for different objectives (i.e., bounded rationality). We introduce $\epsilon$-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework to learn AI policies that are robust to such multi-agent reality gaps. The objective of ERMAS theoretically guarantees robustness to the $\epsilon$-Nash equilibria of other agents – that is, robustness to behavioral deviations with a regret of at most $\epsilon$. ERMAS efficiently solves a first-order approximation of the robustness objective using meta-learning methods. We show that ERMAS yields robust policies for repeated bimatrix games and optimal adaptive taxation in economic simulations, even when baseline notions of robustness are uninformative or intractable. In particular, we show ERMAS can learn tax policies that are robust to changes in agent risk aversion, improving policy objectives (social welfare) by up to 15% in complex spatiotemporal simulations using the AI Economist (Zheng et al., 2020).
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: ERMAS efficiently trains RL agents that are robust to reality gaps in multi-agent simulations, such as complex economic simulations.
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=73wfEpjZpS
15 Replies

Loading