General agents need world models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We prove that any agent capable of flexible goal-directed behaviour must have learned a world model, and provide algorithms for recovering this world model from the agent.
Abstract: Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent's policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.
Lay Summary: What are the key ingredients for building truly general AI agents, capable of performing a wide range of tasks? One possibility is that these agents, like humans, need a rich internal model of the world, capable of predicting the consequences of their actions in simulating different possibilities. However, most AI systems are black boxes, and its unclear if they have anything like a coherent understanding of the world. In this paper we a formal answer to this question, proving that any AI agent that can solve multi-step goal-directed tasks, must have learned a predictive model of its environment, and that we can extract this 'world model' from the agent. Importantly, we show that to achieve more complex goals with a higher probability of success, agents need to must increasingly accurate and detailed world models. This discovery has several key implications. It implies there is no capability without understanding---agents can't do things unless they learn an accurate model of the task, much like a human would need to learn a mental model of a chess board, its moves, and the opponent, to play chess. This places an upper bound on the capabilities of agents operating in messy real-world environments, where learning an accurate model can be prohibitively difficult. It is also crucial for developing safer and more interpretable agents, as it gives us a way to `back out' a model of the agent and its environment, which we can use to debug the agent's behaviour even in environments we don't fully understand.
Primary Area: Theory->Reinforcement Learning and Planning
Keywords: Agents, world models, reinforcement learning, goal-conditioned reinforcement learning, causality, generalization
Submission Number: 6723
Loading