In-Context Imitation Learning via Next-Token Prediction

Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

Published: 19 May 2025, Last Modified: 28 Jan 20252025 IEEE International Conference on Robotics and Automation (ICRA)EveryoneCC BY 4.0

Abstract: We explore how to enable in-context learning capabilities of next-token prediction models for robotics, allowing the model to perform novel tasks by prompting it with human teleop demonstration examples without fine-tuning. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories, which include images, proprioceptive states, and actions. This approach allows flexible and training-free execution of new tasks at test time, achieved by prompting the model with demonstration trajectories of the new task. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompts and the training data. In a multi-task environment setup, ICRT significantly outperforms current state-of-the-art robot foundation models on generalization to unseen tasks. Code, checkpoints and data are available on https://icrt.dev.