Keywords: World Model, Self-Improvement, Data Curation
TL;DR: We propose a self-improving framework for learning action-conditioned world models via asymmetric forward–inverse consistency
Abstract: Action-conditioned world models are essential for policy evaluation, optimization, and planning, yet achieving control-relevant accuracy remains challenging. Unlike policy learning focused primarily on optimal actions, world models must be reliable under a broader range of suboptimal actions, posing a critical robustness challenge. To address this, we propose Asymmetric Self-Improving Model (ASIM), a framework for learning world models via forward–inverse consistency. Our key insight is that predicting high-dimensional action-conditioned state transitions is often harder than verifying (i) the plausibility of the predicted states and (ii) the reachability under the corresponding actions. Motivated by this asymmetry, we pair a forward world model with a subgoal generator obtained from large-scale video corpora and an inverse model that infers actions from only a subset of relevant states. By enforcing cycle consistency among proposed subgoals, inferred actions, and forward rollouts, ASIM provides a verification mechanism for self-improvement in under-explored regimes. Across nine tasks in MiniGrid, RoboMimic, and ManiSkill, our method reduces the data budget by more than half while improving downstream policy performance by over 18%.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 41
Loading