When does Predictive Inverse Dynamics Outperform Behavior Cloning? Exploring the Role of Action and State Uncertainty

ICLR 2026 Conference Submission21655 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: imitation learning, behavior cloning, inverse dynamic models, sample efficiency
TL;DR: We show that predictive inverse dynamic models can be more sample efficient than BC
Abstract: Offline imitation learning aims to train agents from demonstrations without interacting with the environment, but standard approaches like behavior cloning (BC) often fail when expert demonstrations are limited. Recent work has introduced a class of architectures we call predictive inverse dynamics models (PIDM), which combine a future state predictor with an inverse dynamics model to infer actions to reach the predicted future states. Although PIDM can be considered a form of behavioral cloning (in the sense of Bayes-optimality), it often outperforms conventional BC in practice. Although PIDM has shown promise, its benefits remain poorly understood. In this work, we analyze PIDM in the offline imitation learning setting and provide a theoretical explanation: under a perfect state predictor, the prediction error of PIDM can be lower than that of conventional BC, even in low-data regimes, and this gap increases when additional data sources can be leveraged. This efficiency gain is characterized by the variance of actions conditioned on future states, highlighting PIDM’s ability to reduce uncertainty in states where future context is informative. We further demonstrate how this uncertainty reduction translates into sample efficiency improvements. We validate these insights empirically under more general conditions in 2D navigation tasks using human demonstrations, where BC requires on average 2.8 more samples than PIDM to reach comparable performance. Finally, we extend our evaluation to a complex 3D environment in a modern video game with high-dimensional visual inputs, and stochastic transitions, where BC requires over 66\% more samples than PIDM in a realistic setting.
Primary Area: reinforcement learning
Submission Number: 21655
Loading