CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language-Action Models, Continual Learning
TL;DR: CLARE is a general, parameter-efficient framework for exemplar-free continual learning with VLAs that can learn new tasks without catastrophic forgetting of previous tasks, vastly outperforming even exemplar-based methods.
Abstract: To teach robots complex manipulation tasks, it is now common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark and five real-world tasks, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code, data and videos are available at [tum-lsy.github.io/clare](https://tum-lsy.github.io/clare).
Submission Number: 30
Loading