Towards Policy-Aware World Models

Varun Giridhar; Ignat Georgiev; Hrishit Leen; Nicklas Hansen; Animesh Garg

Towards Policy-Aware World Models

Varun Giridhar, Ignat Georgiev, Hrishit Leen, Nicklas Hansen, Animesh Garg

06 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: world models, reinforcement learning

TL;DR: Policy-gradient ESNR predicts downstream policy performance, giving a practical diagnostic for “policy-aware” world models and guiding pretraining, architecture tweaks, and policy choice.

Abstract: World models have received significant attention from the robotics and computer vision community, both of whom have started scaling to networks comprising billions of parameters in the hope of unlocking new robot skills. In this paradigm, models are pre-trained on internet-scale data and then fine-tuned on robot data to learn policies. However, it is still unclear what makes a good world model for downstream policy learning. We show that world model prediction loss is in many instances uncorrelated with policy performance, forcing practitioners to train models to completion for correct evaluation. This results in slow, costly iterations of model training and policy evaluation. In this work, we demonstrate that the expected signal-to-noise ratio (ESNR) of policy gradients provides a reliable training-time metric for downstream policy performance. This provides a handle on the world model's policy awareness, which denotes how well a policy can learn from a model. We show that ESNR can be used to understand (1) when world models are sufficiently pre-trained, (2) how architecture changes affect downstream performance and (3) what is the best policy learning method for a given world model. Crucially, ESNR can be computed on-the-fly with minimal overhead and without a trained policy. We validate our metric on traditional architectures and tasks as well as large pretrained world models, demonstrating the practical utility of ESNR for practitioners who wish to train or finetune such models for robot applications. Visualizations and code available here: https://policy-aware.github.io/paper-anon.

Supplementary Material: pdf

Primary Area: reinforcement learning

Submission Number: 2644

Loading