Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics

Boxuan Zhang; Weipu Zhang; Zhaohan Feng; Wei Xiao; Jian Sun; Jie Chen; Gang Wang

Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics

Boxuan Zhang, Weipu Zhang, Zhaohan Feng, Wei Xiao, Jian Sun, Jie Chen, Gang Wang

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-task reinforcement learning, world model, transformer, mixture-of-world models

TL;DR: This paper proposes Mixture-of-World models (MoW), a novel and sample-efficient world model architecture for multi-task reinforcement learning.

Abstract: A fundamental challenge in multi-task reinforcement learning (MTRL) is achieving sample efficiency in visual domains where tasks exhibit significant heterogeneity in both observations and dynamics. Model-based RL (MBRL) offers a promising path to sample efficiency through world models, but standard monolithic architectures struggle to capture diverse task dynamics, leading to poor reconstruction and prediction accuracy. We introduce the mixture-of-world models (MoW), a scalable architecture that integrates three key components: i) modular VAEs for task-adaptive visual compression, ii) a hybrid Transformer-based dynamics model combining task-conditioned experts with a shared backbone, and, iii) a gradient-based task clustering strategy for efficient parameter allocation. On the Atari 100k benchmark, **a single MoW agent** (trained once over Atari $26$ games) achieves a mean human-normalized score of $\mathbf{110.4}$%, competitive with the $\mathbf{114.2}$% achieved by the recent STORM-an ensemble of $26$ task-specific models-while requiring $\mathbf{50}$% fewer parameters. On Meta-World, MoW attains a $\mathbf{74.5}$% average success rate within $300$K steps, establishing a new state-of-the-art. These results demonstrate that MoW provides a scalable and parameter-efficient foundation for generalist world models.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 8415

Loading