When Novices Teach Better: Improving Behavioral Cloning with Low-Skill Data

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Behavioral Cloning, Suboptimal Demonstrations
TL;DR: We show that training BC models on low-skill data can improve task performance and propose a measure of resilience to explain why.
Abstract: Behavioral cloning (BC), which trains models to replicate behavior from offline demonstrations, is a common approach in reinforcement learning. Several prior works argue that BC requires expert demonstrations and performs poorly when trained on low-skill or suboptimal data. We challenge this assumption by showing that, in certain regimes, training on low-skill demonstrations can yield models that outperform those trained on high-skill data. Since expert data is often costly and scarce, while low-skill data is cheaper and more abundant, this finding has important practical implications. To explain the result, we introduce a measure that quantifies the \emph{resilience} of a policy—its ability to maintain reward under random perturbations—and show that resilience aligns with observed performance differences. Building on this insight, we introduce a new skill-based training curricula—structuring the training process according to policy skill levels—and show consistently improve BC performance compared to treating all data uniformly or filtering for experts. We validate our findings in a synthetic environment and using human data from Chess and Racing, showing consistency across domains.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 22820
Loading