Learning to Incentivize on the Fly: Leader-Follower Games with Policy Recommendation

Weizheng Zhang; Siyu Chen; Zhuoran Yang

Learning to Incentivize on the Fly: Leader-Follower Games with Policy Recommendation

Weizheng Zhang, Siyu Chen, Zhuoran Yang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Leader-Follower Games, Policy Recommendation, Incentive Compatibility, Multi-agent Reinforcement Learning

TL;DR: This paper presents a provably efficient RL algorithm for leader-follower Stackelberg games under an incentive compatibility constraint, aiming to optimize the leader’s rewards while ensuring incentive compatibility in online learning.

Abstract: In dynamic and strategic interactions, ensuring incentive compatibility (IC) is essential for achieving stable and predictable outcomes. This paper explores online learning under a novel IC constraint in the context of leader-follower Stackelberg games with policy recommendation. In these games, the leader announces a policy to commit to while recommending a policy for the follower to adopt. The leader's optimal strategy is captured by the Stackelberg equilibrium, where the leader's announced policy maximizes her rewards, and the recommended policy is incentive-compatible—serving as the follower's optimal response given the leader's commitment. We study the online learning problem from the leader's perspective, where she has no prior knowledge of the follower's reward function and hence must infer it solely through observed follower actions. To address this problem, we develop a theory for such games, and propose a provably efficient algorithm that minimizes regrets with respect to both the leader's rewards and violations of the IC constraint. The algorithm integrates the maximum likelihood estimation of the follower’s response model with optimistic planning over an estimated IC constraint. Crucially, we establish that our algorithm is robust to misspecification of the follower's behavioral parameters, exhibiting graceful degradation where the performance loss scales linearly with the estimation error. To the best of our knowledge, this is the first provably efficient online learning algorithm for incentive-compatible decision-making, highlighting the potential of online learning in addressing challenges in misaligned multi-agent systems.

Primary Area: reinforcement learning

Submission Number: 23162

Loading