Stackelberg Games with Side Information

Published: 31 Oct 2023, Last Modified: 31 Oct 2023MASEC@NeurIPS'23 PosterEveryoneRevisionsBibTeX
Keywords: stackelberg game, stackelberg security game, contextual bandit, side information
TL;DR: We study the problem of learning in Stackelberg (security) games with side information.
Abstract: We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.
Submission Number: 1
Loading