ReWiND: Learning New Tasks from Language Without New Demonstrations

Published: 22 Jun 2025, Last Modified: 27 Jul 2025IBRL @ RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Offline Reinforcement Learning, Reward Learning, Reward Modeling, Language
TL;DR: ReWiND enables real-world robots to efficiently learn new manipulation tasks from language instructions alone by training a generalizable reward model trained on a small initial demonstration dataset.
Abstract: We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND’s reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4× in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks, beating baselines by 2× in simulation and improving real-world pretrained bimanual policies by 5×, taking a step towards scalable, real-world robot learning. See website at https://rewind-reward.github.io/.
Submission Number: 18
Loading