LUMOS: Language-Conditioned Imitation Learning with World Models

Iman Nematollahi; Branton DeMoss; Akshay L Chandra; Nick Hawes; Wolfram Burgard; Ingmar Posner

LUMOS: Language-Conditioned Imitation Learning with World Models

Iman Nematollahi, Branton DeMoss, Akshay L Chandra, Nick Hawes, Wolfram Burgard, Ingmar Posner

Published: 18 Sept 2025, Last Modified: 18 Sept 2025LSRW PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: World Models, Language-Conditioned Imitation Learning, Reinforcement Learning

TL;DR: LUMOS enables zero-shot language-conditioned robot control by learning imitation policies entirely offline in the latent space of a world model trained on real-world play data.

Abstract: We introduce LUMOS, a language-conditioned imitation learning framework that acquires multi-task, long-horizon skills by training in the latent space of a learned world model and transfers them zero-shot to real robots. By learning on-policy in latent space, LUMOS mitigates distribution shift common in offline imitation learning. Coherent long-horizon behavior is achieved through latent planning, multimodal hindsight relabeling, and intrinsic rewards defined over multi-step rollouts. On the CALVIN benchmark, LUMOS outperforms prior methods on chained multi-task evaluations and is, to our knowledge, the first to achieve real-world, language-conditioned visuomotor control using an offline world model. Full paper available at: https://arxiv.org/abs/2503.10370

Submission Number: 2

Loading