In-Context Imitation Learning via Next-Token Prediction
Abstract: We explore how to enable in-context learning
capabilities of next-token prediction models for robotics, allowing
the model to perform novel tasks by prompting it with human
teleop demonstration examples without fine-tuning. We propose
In-Context Robot Transformer (ICRT), a causal transformer that
performs autoregressive prediction on sensorimotor trajectories,
which include images, proprioceptive states, and actions. This
approach allows flexible and training-free execution of new
tasks at test time, achieved by prompting the model with
demonstration trajectories of the new task. Experiments with a
Franka Emika robot demonstrate that the ICRT can adapt
to new tasks specified by prompts, even in environment
configurations that differ from both the prompts and the training
data. In a multi-task environment setup, ICRT significantly
outperforms current state-of-the-art robot foundation models
on generalization to unseen tasks. Code, checkpoints and data
are available on https://icrt.dev.
Loading