Stochastic Principal-Agent Problems: Computing and Learning Optimal History-Dependent Policies

Jiarui Gan; R Majumdar; Debmalya Mandal; Goran Radanovic

Stochastic Principal-Agent Problems: Computing and Learning Optimal History-Dependent Policies

Jiarui Gan, R Majumdar, Debmalya Mandal, Goran Radanovic

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic games, Markov games, Stackelberg games, information design, mechanism design

Abstract: We study a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to signal her own information. The players communicate with each other and then select actions independently. Both players are {\em far-sighted}, aiming to maximize their total payoffs over the entire time horizon. We consider both the computation and learning of the principal's optimal policy. The key challenge lies in enabling {\em history-dependent} policies, which are essential for achieving optimality in this model but difficult to cope with because of the exponential growth of possible histories as the size of the model increases; explicit representation of history-dependent policies is infeasible as a result. To address this challenge, we develop algorithmic techniques based on the concept of {\em inducible value set}. The techniques yield an efficient algorithm that computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. We also present an efficient learning algorithm for an episodic reinforcement learning setting with unknown transition probabilities. The algorithm achieves sublinear regret $\widetilde{\mathcal{O}}(T^{2/3})$ for both players over $T$ episodes.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 28300

Loading