Stackelberg Policy Gradient: Evaluating the Performance of Leaders and FollowersDownload PDF

Published: 25 Apr 2022, Last Modified: 05 May 2023ICLR 2022 Workshop on Gamification and Multiagent SolutionsReaders: Everyone
Keywords: Reinforcement Learning, Multi-agent learning, Markov games, Stackelberg
TL;DR: Evaluating and comparing agents trained using the Stackelberg policy gradient vs simultaneous policy gradient.
Abstract: Hierarchical order of play is an important concept for reinforcement learning to understand better the decisions made by strategic agents in a shared environment. In this paper, we compare the learning dynamics between Stackelberg and simultaneous reinforcement learning agents. Agents are trained using their policy gradient and are tested against each other in a tournament. We compare agent performance in zero-sum and non-zero-sum Markov games. We show that the Stackelberg leader performs better in training under the same parameters. However, under the same parameters in the tournament setting, Stackelberg leaders and followers performed similarly to the simultaneous player. Analytically, hierarchical training can potentially provide stronger guarantees for policy gradient.
1 Reply