Matching multiple experts: on the exploitability of multi-agent imitation learning

Matching multiple experts: on the exploitability of multi-agent imitation learning

ICLR 2026 Conference Submission21633 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: imitation learning, multi-agent systems, behavioral cloning, Nash imitation gap

TL;DR: Study of exploitability of multi-agent imitation learning and efficient bounds on the Nash imitation gap.

Abstract: Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations in multi-agent interactive domains. Despite existing guarantees on the performance of the extracted policy, characterizations of its distance to a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general $n$-player Markov Games. We do so by providing examples where even exact measure matching fails, and present challenges associated with the practical case of approximation errors. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium, assuming BC error $\epsilon_{\text{BC}}$. Specifically, for the case of dominant strategy expert equilibria, this provides a Nash imitation gap of $\mathcal{O}\left(n\epsilon_{\text{BC}}/(1-\gamma)^2\right)$ for a discount factor $\gamma$. We generalize this result with a new notion of best-response continuity, and argue that this is implicitly encouraged by standard regularization techniques.

Primary Area: reinforcement learning

Submission Number: 21633

Loading