RL Zero: Zero-Shot Language to Behaviors without any Supervision

Harshit Sikchi; Siddhant Agarwal; Pranaya Jajoo; Samyak Parajuli; Caleb Chuck; Max Rudolph; Peter Stone; Amy Zhang; Scott Niekum

RL Zero: Zero-Shot Language to Behaviors without any Supervision

Harshit Sikchi, Siddhant Agarwal, Pranaya Jajoo, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, Scott Niekum

Published: 28 Feb 2025, Last Modified: 10 Apr 2025WRL@ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: full paper

Keywords: Unsupervised RL, Zero-shot, Language to Policy

TL;DR: An approach for zero-shot inference of action sequences from language commands or video demonstration.

Abstract: Rewards remain an opaque way to specify tasks for Reinforcement Learning, as humans are often unable to predict the optimal behavior corresponding to any given reward function, leading to poor reward design and reward hacking. Language presents an appealing way to communicate intent to agents but prior efforts to bypass reward design through language have been limited by costly and unscalable labeling efforts. In this work, we propose a method for a completely unsupervised alternative to grounding language instructions in a zero-shot manner to obtain policies. We present a solution that takes the form of imagine, project, and imitate: The agent imagines the observation sequence corresponding to the language description of a task, projects the imagined sequence to our target domain, and grounds it to a policy. We show that zero-shot language-to-behavior policy can be achieved by first projecting the imagined sequences, generated using video models, into real observations of an unsupervised RL agent and using zero-shot imitation to mimic the projected observations. Our method, RLZero, is the first to our knowledge to show zero-shot language to behavior generation abilities without any supervision on a variety of tasks. We further show that RLZero can also generate policies zero-shot from cross-embodied videos such as those scraped from YouTube.

Supplementary Material: zip

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Caleb_Chuck1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding availability would significantly influence their ability to attend the workshop in person.

Submission Number: 18

Loading