Online Planning with Offline Pretrained All-in-One World Model

Published: 08 May 2026, Last Modified: 08 May 2026ICRA 2026 Workshop RL4IL OralEveryoneRevisionsCC BY 4.0
Keywords: offline reinforcement learning, all-in-one world model, online planning, model predictive control, goal reaching
TL;DR: TL;DR: Use an offline pretrained masked Transformer — which doubles as both a policy and world model — to do MPC planning at test time, boosting performance with zero extra training. Extends to online finetuning and goal-reaching.
Abstract: Recent work in Offline Reinforcement Learning (RL) has shown that an all-in-one world model pretrained offline via a masked auto-encoding objective can effectively capture the relationships between different modalities (e.g., states, actions, rewards) within trajectory datasets. However, this model's full potential has not been exploited during deployment, where the agent must generate an optimal policy rather than merely reconstruct masked tokens. Since the pretrained model subsumes both a Policy Model and a World Model under appropriate mask patterns, we propose leveraging it for \textit{online planning} via Model Predictive Control (MPC) at test time, using the model's own predictive capability to guide action selection. Empirical results on D4RL and RoboMimic show that our online planning framework significantly improves decision-making performance of the pretrained model without any additional parameter training. Furthermore, the framework extends naturally to Offline-to-Online (O2O) RL and Goal-Reaching RL, yielding more substantial gains when an online interaction budget is available and better generalization when diverse task targets are specified.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 21
Loading