Imitating the Imperfect: Offline-to-Online Robust Imitation Learning from Heterogeneous Demonstrators

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Heterogeneous Demonstrators, Optimality Estimation, Unknown Expertise, Offline-to-Online
Abstract: Imitation learning~(IL) typically relies on large-scale *offline* demonstrations collected from multiple human or algorithmic demonstrators. However, most existing approaches assume these demonstrators are homogeneous or near-optimal experts, a convenient but unrealistic assumption in real-world applications. In practice, demonstrations are often collected from heterogeneous demonstrators with unknown and varying levels of expertise, resulting in highly mixed-quality data. In this work, we study a challenging yet practical setting of imitation learning from such heterogeneous imperfect demonstrators. We propose *Latent Expertise and Optimality Scoring based Imitation Learning* (LEOS-IL), a principled framework that jointly learns (i) a state-action optimality scoring model and (ii) latent expertise levels for each demonstrator from only unlabeled suboptimal *offline* demonstrations. The learned scoring model is then integrated into an *online* policy optimization procedure, where the agent is trained to maximize estimated optimality scores while its online rollouts are iteratively leveraged to refine the scoring model, thereby mitigating distributional shift during online adaptation. We further provide theoretical guarantees for optimal policy recovery and convergence of the proposed method. Experiments on continuous-control benchmarks demonstrate that our approach consistently outperforms existing baselines across highly heterogeneous and low-quality demonstration regimes.
Submission Number: 77
Loading