When does Predictive Inverse Dynamics Outperform Behavior Cloning? Exploring the Role of Action and State Uncertainty
Keywords: imitation learning, behavior cloning, inverse dynamic models, sample efficiency
TL;DR: We show that predictive inverse dynamic models can be more sample efficient than BC
Abstract: Offline imitation learning aims to train agents from demonstrations without interacting with the environment, but standard approaches like behavior cloning (BC) often fail when expert demonstrations are limited. Recent work has introduced a class of architectures we call predictive inverse dynamics models (PIDM), which combine a future state predictor with an inverse dynamics model (IDM) to infer actions to reach the predicted future states. While PIDM often outperforms BC, the reasons behind its benefits remain unclear.
In this paper, we analyze PIDM in the offline imitation learning setting and provide a theoretical explanation: conditioning the IDM on the predicted future state reduces variance, whereas predicting the future state introduces bias. We establish conditions on the state predictor bias for PIDM to achieve lower prediction error and higher sample efficiency than BC, with the gap widening when additional data sources are available. The efficiency gain is characterized by the variance of actions conditioned on future states, highlighting PIDM’s ability to reduce uncertainty in states where future context is informative. We validate these insights empirically under more general conditions in 2D navigation tasks using human demonstrations, where BC requires up to five times (three times on average) more samples than PIDM to reach comparable performance. Finally, we extend our evaluation to a complex 3D environment in a modern video game with high-dimensional visual inputs and stochastic transitions, showing BC requires over 66% more samples than PIDM in a realistic setting.
Primary Area: reinforcement learning
Submission Number: 21655
Loading