Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?
Abstract: We study multi-player general-sum Markov games with one of the players designated as the leader and the rest regarded as the followers. In particular, we focus on the class of games where the followers are myopic, i.e., the followers aim to maximize the instantaneous rewards. For such a game, our goal is to find the Stackelberg-Nash equilibrium (SNE), which is a policy pair $(\pi^*, \nu^*)$ such that (i) $\pi^*$ is the optimal policy for the leader when the followers always play their best response, and (ii) $\nu^*$ is the best response policy of the followers, which is a Nash equilibrium of the followers' game induced by $\pi^*$. We develop sample efficient reinforcement learning (RL) algorithms for solving SNE under both the online and offline settings. Respectively, our algorithms are optimistic and pessimistic variants of least-squares value iteration and are readily able to incorporate function approximation tools for handling large state spaces. Furthermore, for the case with linear function approximation, we prove that our algorithms achieve sublinear regret and suboptimality under online and offline setups respectively. To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with myopic followers.
1 Reply
Loading