Keywords: No-Regret Learning, Reinforcement Learning, Stackelberg Equilibrium
TL;DR: A novel and provably convergent framework based on asymmetric learning dynamics for stackelberg equilibrium
Abstract: The Stackelberg equilibrium, a cornerstone of hierarchical game theory, models scenarios with a committed leader and a rational follower. While central to economics and security, finding this equilibrium in dynamic, unknown environments through learning remains a significant challenge. Traditional multi-agent learning often focuses on symmetric dynamics (e.g., self-play) which typically converge to Nash equilibria, not Stackelberg. We propose a novel and provably convergent framework based on \textit{asymmetric learning dynamics}. In our model, the leader employs a reinforcement learning (RL) algorithm suitable for non-stationary environments to learn an optimal commitment, while the follower uses a no-regret online learning algorithm to guarantee rational, best-response behavior in the limit. We provide a rigorous theoretical analysis demonstrating that this asymmetric interaction forces the time-averaged payoffs of both agents to converge to the Stackelberg equilibrium values. Our framework corrects several flawed approaches in prior analyses and is validated through a comprehensive set of experiments on canonical matrix and Markov games.
Primary Area: reinforcement learning
Submission Number: 16672
Loading