Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations

Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations

ICLR 2026 Conference Submission21986 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Imitation Learning, Imitation Learning, Undesirable Demonstrations

TL;DR: We develop an offline imitation learning approach that learns from expert and undesirable demonstrations

Abstract: Offline imitation learning typically learns from expert and unlabeled demonstrations, yet often overlooks the valuable signal in explicitly undesirable behaviors. In this work, we study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations. We propose a novel formulation that optimizes a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data. Although the resulting objective is a DC (Difference-of-Convex) program, we prove that it becomes *convex* when expert demonstrations outweigh undesirable demonstrations, enabling a practical and stable non-adversarial training objective. Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework. Extensive experiments on standard offline imitation learning benchmarks demonstrate that our approach consistently outperforms state-of-the-art baselines.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 21986

Loading