From Many Imperfect to One Trusted: Imitation Learning from Heterogeneous Demonstrators with Unknown Expertise

From Many Imperfect to One Trusted: Imitation Learning from Heterogeneous Demonstrators with Unknown Expertise

ICLR 2026 Conference Submission20336 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation Learning, Heterogeneous Demonstrators, Optimality Estimation, Unknown Expertise

Abstract: Imitation learning (IL) typically depends on large-scale demonstrations collected from multiple human or algorithmic demonstrators. Yet, most existing methods assume these demonstrators are either homogeneous or near-optimal---a convenient but unrealistic assumption in many real-world settings. In this work, we tackle a more practical and challenging setting: IL from heterogeneous demonstrators with unknown and widely varying expertise levels. Instead of assuming expert dominance, we model each demonstrator's behavior as a flexible mixture of optimal and suboptimal policies, and propose a novel IL framework that jointly learns (a) a state-action optimality scoring model and (b) the latent expertise level of each demonstrator, using only a handful of human queries. The learned scoring model is then integrated into an policy optimization procedure, where it is fine-tuned with offline demonstrations, on-policy rollouts, and a fine-grained mixup regularizer to produce informative rewards. The agent is trained to maximize these learned rewards in an iterative fashion. Experiments on continuous-control benchmarks show that our approach consistently outperforms baseline methods. Even when all demonstrators are highly suboptimal, each exhibiting only 5-15% optimality, our method achieves performance comparable to a baseline trained on purely optimal demonstrations, despite our lack of optimality labels.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 20336

Loading