Learning in Stochastic Stackelberg Games

Published: 01 Jan 2024, Last Modified: 15 May 2025ACC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a learning algorithm for players to converge to their stationary policies in a general sum stochastic sequential Stackelberg game. The algorithm is a two time scale implicit policy gradient algorithm that provably converges to stationary points of the optimization problems of the two players. Our analysis allows us to move beyond the assumptions of zero-sum or static Stackelberg games made in the existing literature for learning algorithms to converge.
Loading