Stackelberg Learning with Outcome-based Payment

Tom Yan; Chicheng Zhang

Stackelberg Learning with Outcome-based Payment

Tom Yan, Chicheng Zhang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent learning, Learning with Incentives, Outcome-based Payments

Abstract: With businesses starting to deploy agents to act on their behalf, an emerging challenge that businesses have to contend with is how to incentivize other agents with differing interests to work alongside its own agent. In present day commerce, payment is a common way that different parties use to \emph{economically} align their interests. In this paper, we study how one could analogously learn such payment schemes for aligning agents in the decentralized multi-agent setting. We model this problem as a Stackelberg Markov game, in which the leader can commit to a policy and also designate a set of outcome-based payments. We are interested in answering the question: when do efficient learning algorithms exist? To this end, we characterize the computational and statistical complexity of planning and learning in general-sum and cooperative games. In general-sum games, we find that planning is computationally intractable. In cooperative games, we show that learning can be statistically hard without payment and efficient with payment, showing that payment is necessary for learning even with aligned rewards. Altogether, our work aims to consolidate our theoretical understanding of outcome-based payment algorithms that can economically align decentralized agents.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 18778

Loading