Keywords: dynamic time warping, few-shot imitation learning, retrieval, foundation models
TL;DR: Subsequence-DTW for sub-trajectory retrieval to augment few-shot policy learning
Abstract: Robot learning is experiencing a surge in the size, diversity, and complexity of pre-collected datasets, paralleling trends in NLP and computer vision. Many methods treat these datasets as multi-task expert data to train generalist policies. However, while generalist policies improve average performance, they often underperform on individual tasks due to negative transfer, compared to specialist policies. In this work, we advocate for training policies during deployment by non-parametrically retrieving and training models on relevant data at test time, rather than relying on zero-shot pre-trained policies. We show that many robotics tasks share many low-level behaviors and that retrieval at the ``sub"-trajectory granularity enables significantly improved data utilization, generalization, and robustness in adapting policies to novel problems. In contrast, existing retrieval methods tend to underutilize the data and miss out on shared cross-task content. Our proposed method, $\texttt{STRAP}$, uses vision foundation models and dynamic time warping to retrieve sub-sequences from large training corpora. $\texttt{STRAP}$ outperforms prior retrieval algorithms in both simulated and real-world experiments, scaling to larger datasets and learning robust control policies from minimal real-world demonstrations.
Submission Number: 18
Loading