Keywords: Imitation Learning, In-context Learning, State-Space Model
TL;DR: This paper introduces RoboSSM, a scalable in-context imitation learning framework that leverages State-Space Models (SSMs).
Abstract: In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks.
However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training.
In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM).
Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts.
We evaluate our approach on the LIBERO benchmark and compare it against strong Transformer-based ICIL baselines.
Experiments show that RoboSSM extrapolates effectively to varying numbers of in-context demonstrations, yields high performance on unseen tasks, and remains robust in long-horizon scenarios.
These results of RoboSSM highlight the potential of SSMs as an efficient and scalable backbone for ICIL.
Submission Number: 11
Loading