One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
Keywords: world modeling, programmatic RL, probabilistic program, symbolic rule learning, intrinsically motivated and open-ended learning
Abstract: Symbolic world modeling is the task of inferring and representing the transitional dynamics of an environment as an executable program. Previous research
on symbolic world modeling has focused on simple, deterministic environments
with abundant data and human-provided guidance. We address the more realistic and challenging problem of learning a symbolic world model in a complex, stochastic environment with severe constraints: a limited interaction budget
where the agent has only “one life” to explore a hostile environment and no external guidance in the form of human-provided, environment-specific rewards or
goals. We introduce OneLife, a framework that models world dynamics through
conditionally-activated programmatic laws within a probabilistic programming
framework. Each law operates through a precondition-effect structure, allowing
it to remain silent on irrelevant aspects of the world state and predict only the attributes it directly governs. This creates a dynamic computation graph that routes
both inference and optimization only through relevant laws for each transition,
avoiding the scaling challenges that arise when all laws must contribute to predictions about a complex, hierarchical state space, and enabling accurate learning
of stochastic dynamics even when most rules are inactive at any given moment.
To evaluate our approach under these demanding constraints, we introduce a new
evaluation protocol that measures (a) state ranking, the ability to distinguish plausible future states from implausible ones, and (b) state fidelity, the ability to generate future states that closely resemble reality. We develop and evaluate our framework on Crafter-OO, our reimplementation of the popular Crafter environment
that exposes a structured, object-oriented symbolic state and and a pure transition function that operates on that state alone. OneLife can successfully learn
key environment dynamics from minimal, unguided interaction, outperforming a
strong baseline on 16 out of 23 scenarios tested.
We also demonstrate
the world model’s utility for planning, where rollouts simulated within the world
model successfully identify superior strategies in multi-step goal-oriented tasks.
Our work establishes a foundation for autonomously constructing programmatic world models of unknown,
complex environments.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21236
Loading