- Abstract: Many real world systems such as traffic networks and ride-sharing networks can be tackled using multi-agent reinforcement learning. In such settings, self-interested agents must learn how to interact with each other in a shared stochastic environment. However, current methods within multi-agent reinforcement learning generally lead to agents taking joint actions over time that produce welfare inefficient and globally suboptimal outcomes. To this end, we propose a new method in which a meta-agent modifies agents' rewards leading to convergence to policies that produce globally efficient outcomes in Markov games. Our method does not require agents to have a priori knowledge of their environment - both the meta-agent and the agents learn from interacting with it. Our theoretical results show that using our method, multi-agent reinforcement learning algorithms always produce efficient outcomes. We apply our method to solve a challenging problem within an application in economic systems with thousands of agents.
- Keywords: multi-agent systems, markov games, stochastic games, multi-agent reinforcement learning, incentive design
- TL;DR: A method to induce efficient and desirable outcomes in multi-agent systems