Keywords: Robustness, Multi-Agent Learning, Sim2Real, Reinforcement Learning
Abstract: Policies for real-world multi-agent problems, such as optimal taxation, can be learned in multi-agent simulations with AI agents that emulate humans. However, simulations can suffer from reality gaps as humans often act suboptimally or optimize for different objectives (i.e., bounded rationality). We introduce $\epsilon$-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework to learn AI policies that are robust to such multi-agent reality gaps. The objective of ERMAS theoretically guarantees robustness to the $\epsilon$-Nash equilibria of other agents – that is, robustness to behavioral deviations with a regret of at most $\epsilon$. ERMAS efficiently solves a first-order approximation of the robustness objective using meta-learning methods. We show that ERMAS yields robust policies for repeated bimatrix games and optimal adaptive taxation in economic simulations, even when baseline notions of robustness are uninformative or intractable. In particular, we show ERMAS can learn tax policies that are robust to changes in agent risk aversion, improving policy objectives (social welfare) by up to 15% in complex spatiotemporal simulations using the AI Economist (Zheng et al., 2020).
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: ERMAS efficiently trains RL agents that are robust to reality gaps in multi-agent simulations, such as complex economic simulations.
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=73wfEpjZpS
15 Replies
Loading