SpoilDICE: Safe and Performant Offline Imitation Learning from Dual Demonstrations

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: safe imitation learning, offline imitation learning, stationary distribution matching, safe decision making
TL;DR: we propose SpoilDICE, a safe-performant offline imitation learning from dual demonstrations
Abstract: Ensuring both high returns and strict safety guarantees remains a fundamental challenge in imitation learning (IL). Existing approaches often rely on manually specified reward or cost functions, which are difficult to design and rarely capture complex safety constraints in real-world settings. We tackle this issue by introducing the problem of imitation learning from safety–performance dual demonstrations, where training data naturally divides into (i) safe demonstrations that respect safety requirements but may be suboptimal, and (ii) performant demonstrations that achieve high returns but may violate safety. To address the problem in the offline setting, we propose SpoilDICE (Safe–Performant Offline Imitation Learning via stationary DIstribution Correction Estimation). SpoilDICE integrates DICE-based distribution matching with support constraints derived from safe demonstrations, enabling agents to exploit high-return behaviors while remaining within safety-compliant regions of the state–action space. We validate our approach in both tabular gridworld and continuous safety-critical domains from the DSRL dataset and Safety Gymnasium benchmark. Empirical results demonstrate that SpoilDICE consistently produces policies that achieve strong performance without sacrificing safety, substantially outperforming prior offline IL baselines.
Primary Area: reinforcement learning
Submission Number: 11146
Loading