Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs

ICLR 2026 Conference Submission16973 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Principal-Agent Problem, Markov Decision Process, Reinforcement Learning
TL;DR: We analyze robust subsidy schemes for principal–agent MDPs with boundedly rational agents.
Abstract: We investigate a principal-agent problem modeled within a Markov Decision Process, where the principal and the agent have their own rewards. The principal can provide subsidies to influence the agent’s action choices, and the agent’s resulting action policy determines the rewards accrued to the principal. Our focus is on designing a robust subsidy scheme that maximizes the principal’s cumulative expected return, even when the agent displays bounded rationality and may deviate from the optimal action policy after receiving subsidies. As a baseline, we first analyze the case of a perfectly rational agent and show that the principal’s optimal subsidy coincides with the policy that maximizes social welfare, the sum of the utilities of both the principal and the agent. We then introduce a bounded-rationality model: the globally $\epsilon$-incentive-compatible agent, who accepts any policy whose expected cumulative utility lies within $\epsilon$ of the personal optimum. In this setting, we prove that the optimal robust subsidy scheme problem simplifies to a one-dimensional concave optimization. This reduction not only yields a clean analytical solution but also highlights a key structural insight: optimal subsidies are concentrated along the social-welfare-maximizing trajectories. We further characterize the loss in social welfare—the degradation under the robust subsidy scheme compared to the maximum achievable—and provide an upper bound on this loss. Finally, we investigate a finer-grained, state-wise $\epsilon$-incentive-compatible model. In this setting, we show that under two natural definitions of state-wise incentive-compatibility, the problem becomes intractable: one definition results in non-Markovian agent action policy, while the other renders the search for an optimal subsidy scheme NP-hard.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 16973
Loading