PolicyGRID: Acting to Understand, Understanding to Act

Published: 19 Sept 2025, Last Modified: 19 Sept 2025NeurIPS 2025 Workshop EWMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal discovery, Embodied agents, World models, Interventional reasoning, Multi-objective control, Cyber-physical systems, Causal reinforcement learning, Structural causal models, Policy generation, Robust decision-making
TL;DR: PolicyGRID integrates causal discovery directly into the policy loop, enabling embodied agents to learn world models through their own interventions and generate robust, multi-objective control policies.
Abstract: Embodied agents require internal models that support interventional reasoning, not merely correlational prediction. We present PolicyGRID, an embodied world model that learns causal structure online through its own actions. Unlike traditional approaches that treat causal discovery as preprocessing, PolicyGRID integrates causal learning directly into the policy loop: agents actively probe the environment to resolve causal uncertainty while simultaneously optimizing for competing objectives. This enables agents to adapt their causal understanding as they act, expanding their behavioral repertoire beyond correlation-driven policies. The framework addresses a fundamental challenge in embodied AI: how can agents maintain reliable world models when their own interventions continuously change the data distribution? To validate this approach, we evaluate PolicyGRID in building control across synthetic simulations, public datasets, and real deployment, achieving F1 = 0.89 under real-world conditions and 2.8x higher policy performance than baselines, demonstrating that embedding causal reasoning directly into the policy loop yields more robust, adaptive behavior than correlation-driven world models.
Submission Number: 30
Loading