Closing the Train-Test Gap in World Models for Gradient-Based Planning

Closing the Train-Test Gap in World Models for Gradient-Based Planning

ICLR 2026 Conference Submission16327 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: world models, model predictive control, gradient-based planning, meta-learning

TL;DR: We identify a train-test gap when using gradient-based planning with world models and propose techniques to close it.

Abstract: World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of tasks chosen at inference time. Compared to traditional MPC procedures, which rely either on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose an improved method for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. Moreover, we demonstrate an improvement over the search-based CEM method on an object manipulation task in 10\% of the time budget.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 16327

Loading