Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning
Abstract: Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO's composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.
Lay Summary: Discovering strong strategies in large systems with many self-interested parties is a difficult but important problem in machine learning, especially when modeling systems we see in the world today. Some systems may contain many strong strategies, and we may be interested in not only discovering these strategies but also designing algorithms that favor those with appealing properties of interest. In particular, we created an algorithm that favors strong strategies that increase the summed benefit of all involved parties. This will help us design strategies and systems that account for the welfare of the general populace.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Empirical game theoretic analysis, equilibrium selection, game solving, strategy exploration
Submission Number: 4994
Loading