ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations

Published: 08 Aug 2025, Last Modified: 16 Sept 2025CoRL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Offline Reinforcement Learning, Reward Learning, Reward Modeling, Language
TL;DR: ReWiND enables real-world robots to efficiently learn new manipulation tasks from language instructions alone by training a generalizable reward model trained on a small initial demonstration dataset.
Abstract: We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND’s reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4X in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks in both simulation and on a real bimanual manipulation platform, taking a step towards scalable, real-world robot learning.
Supplementary Material: zip
Submission Number: 132
Loading