Provably Optimal Learning Algorithms for Assistance Games

Nivasini Ananthakrishnan; Mark Bedaywi; Michael I. Jordan; Stuart Russell; Nika Haghtalab

Provably Optimal Learning Algorithms for Assistance Games

Nivasini Ananthakrishnan, Mark Bedaywi, Michael I. Jordan, Stuart Russell, Nika Haghtalab

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Online learning theory, Assistance games, Cooperative games

Abstract: This paper studies an online variant of the *assistance games* framework, where an informed agent and an uninformed agent repeatedly interact over $T$ timesteps to optimize a common reward function. While the informed agent (the human) observes a latent state of the world, the uninformed agent (the assistant) observes only the human's actions. We provide the first provably efficient learning algorithms for repeated assistance games. We introduce the notion of *assistance regret*: the gap between the cumulative utility of interactions and that of the optimal joint policies in hindsight, which map latent states to action pairs. We present decentralized algorithms for both the human and the assistant that achieve a $(1 - 1/e)$-approximate assistance regret rate of $\mathcal{O}(T^{3/4})$, with runtime polynomial in the size of the action and state spaces. These algorithms are general; in particular, they accommodate any no-regret algorithm for the assistant. We prove that achieving a regret approximation factor better than $(1 - 1/e)$ is computationally intractable. Furthermore, we demonstrate how these generic no-regret algorithms can be tailored to a pseudo-decentralized setting---using a shared random string---to achieve the optimal rate of $\mathcal{O}{T^{1/2}}$

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 458

Loading