Learning Task-Sufficient World Models via Intervention-Curriculum Co-Design

Fan Feng; Yujia Zheng; Minghao Fu; Yongqiang Chen; Guangyi Chen; Kevin Patrick Murphy; Biwei Huang; Kun Zhang

Learning Task-Sufficient World Models via Intervention-Curriculum Co-Design

Fan Feng, Yujia Zheng, Minghao Fu, Yongqiang Chen, Guangyi Chen, Kevin Patrick Murphy, Biwei Huang, Kun Zhang

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: World Model; Latent Variable Models

TL;DR: We co-design agent interventions and an adaptive environment curriculum to sequentially learn a world model with task-specific, minimal, and sufficient latent representations, enabling efficient and generalizable policy learning.

Abstract: We study how agents learn world models with latent representations that are task-specific, minimal, and sufficient for sequential decision making. Rather than predicting pixels or relying on generic embeddings, we aim to learn representations that retain exactly the information needed for control across tasks. We model the problem end-to-end as a closed loop of agent–environment interaction, enabling the agent to sequentially acquire minimal and sufficient latent representations over a series of tasks. On the agent side, for each new task, it begins with active intervention to acquire informative trajectories that implicitly reveal task-relevant latent factors, and then trains the world model to learn a latent space that is both minimal and task-sufficient. On the environment side, learning is facilitated through an adaptive curriculum that co-evolves with the agent. By tailoring environment settings and task order to the agent's learning progress, the curriculum exposes control-relevant mechanisms at the right level of difficulty, while jointly scheduling world-model updates with policy learning. This co-design of intervention and curriculum leads to a compact, structured latent space that supports efficient, transferable policy learning and generalization. Empirically, our approach improves sample efficiency and generalization across skills, object–skill compositions, and unseen tasks on standard continuous control and robotic manipulation benchmarks.

Primary Area: reinforcement learning

Submission Number: 7870

Loading