Asymmetric Learning Dynamics For Reaching Stackelberg Equilibrium

Asymmetric Learning Dynamics For Reaching Stackelberg Equilibrium

ICLR 2026 Conference Submission16672 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: No-Regret Learning, Reinforcement Learning, Stackelberg Equilibrium

TL;DR: A novel and provably convergent framework based on asymmetric learning dynamics for stackelberg equilibrium

Abstract: The Stackelberg equilibrium, a cornerstone of hierarchical game theory, models scenarios with a committed leader and a rational follower. While central to economics and security, finding this equilibrium in dynamic, unknown environments through learning remains a significant challenge. Traditional multi-agent learning often focuses on symmetric dynamics (e.g., self-play) which typically converge to Nash equilibria, not Stackelberg. We propose a novel and provably convergent framework based on \textit{asymmetric learning dynamics}. In our model, the leader employs a reinforcement learning (RL) algorithm suitable for non-stationary environments to learn an optimal commitment, while the follower uses a no-regret online learning algorithm to guarantee rational, best-response behavior in the limit. We provide a rigorous theoretical analysis demonstrating that this asymmetric interaction forces the time-averaged payoffs of both agents to converge to the Stackelberg equilibrium values. Our framework corrects several flawed approaches in prior analyses and is validated through a comprehensive set of experiments on canonical matrix and Markov games.

Primary Area: reinforcement learning

Submission Number: 16672

Loading