Keywords: inverse reinforcement learning, contrastive learning, goal-conditioned reinforcement learning
TL;DR: This work presents a framework for achieving consistent zero-shot imitation capabilities by combining goal-conditioned contrastive reinforcement learning with inverse reinforcement learning.
Abstract: In the same way that generative models today conduct most of their training in a self-supervised fashion, how can agentic models conduct their training in a self-supervised fashion, interactively exploring, learning, and preparing to quickly adapt to new tasks? A prerequisite for embodied agents deployed in real world interactions ought to be training with interaction, yet today's most successful AI models (e.g., VLMs, LLMs) are trained without an explicit notion of action. The problem of reward-free exploration is well studied in the unsupervised reinforcement learning (URL) literature but fails to prepare agents for rapid adaptation to new demos. Today's language and vision models are trained on data provided by humans, which provides a strong inductive bias for the sorts of tasks that the model will have to solve. However, when prompted to imitate a new task, some methods perform distribution matching against the demonstration data without properly accounting for the difficulty of various tasks. The key contribution of our paper is a method for pre-training interactive agents in a self-supervised fashion, so that they can instantly mimic expert demonstrations. Our method treats goals (i.e., observations) as the atomic construct. During training, our method automatically proposes goals and practices reaching them, building off prior work in reinforcement learning exploration. During evaluation, our method solves an (amortized) inverse reinforcement learning problem to explain demonstrations as optimal goal-reaching behavior. Experiments on standard benchmarks (not designed for goal-reaching) show that our approach outperforms prior methods for zero-shot imitation.
Primary Area: reinforcement learning
Submission Number: 21072
Loading