LUMOS: Language-Conditioned Imitation Learning with World Models

Published: 18 Sept 2025, Last Modified: 18 Sept 2025LSRW PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: World Models, Language-Conditioned Imitation Learning, Reinforcement Learning
TL;DR: LUMOS enables zero-shot language-conditioned robot control by learning imitation policies entirely offline in the latent space of a world model trained on real-world play data.
Abstract: We introduce LUMOS, a language-conditioned imitation learning framework that acquires multi-task, long-horizon skills by training in the latent space of a learned world model and transfers them zero-shot to real robots. By learning on-policy in latent space, LUMOS mitigates distribution shift common in offline imitation learning. Coherent long-horizon behavior is achieved through latent planning, multimodal hindsight relabeling, and intrinsic rewards defined over multi-step rollouts. On the CALVIN benchmark, LUMOS outperforms prior methods on chained multi-task evaluations and is, to our knowledge, the first to achieve real-world, language-conditioned visuomotor control using an offline world model. Full paper available at: https://arxiv.org/abs/2503.10370
Submission Number: 2
Loading