Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals

Junyan Liu; Arnab Maiti; Artin Tajdini; Kevin Jamieson; Lillian J. Ratliff

Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals

Junyan Liu, Arnab Maiti, Artin Tajdini, Kevin Jamieson, Lillian J. Ratliff

Published: 01 May 2025, Last Modified: 01 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an *adversarial* order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of *unknown type*. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $\mathcal{O}(\min(\sqrt{KT\log N},K\sqrt{T}))$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent's response varies smoothly with the incentive and is governed by a Lipschitz constant $L$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{\mathcal{O}}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.

Lay Summary: Imagine you're a seller trying to convince a stream of customers—each with different preferences—to buy one of several products. You don’t know their exact preferences, and they’re arriving in an unpredictable order. In each interaction, you can offer incentives (like discounts) to influence their choice, but you only get to see what they pick and how well it worked. How should you adapt your strategy over time to get the best overall outcome? In our work, we study this problem through the lens of machine learning and economics, modeling it as a repeated principal-agent problem. We show that if you have no information about how customers respond to incentives, the problem is too hard to solve well. But with a little structure—like knowing how different customer types behave in general, or assuming their behavior changes smoothly with incentives—we design algorithms that learn how to offer better incentives over time, with performance guarantees that get better as more interactions happen. Our results also extend to situations where multiple products can be incentivized at once.

Primary Area: Theory->Online Learning and Bandits

Keywords: Principal-agent problem, Incentive design, Regret Minimization, Adversarial, Greedy response, Lipschitz response

Submission Number: 12416

Loading