Keywords: Robotic Manipulation; In-context Imitation Learning; Semantic Correspondence
TL;DR: We propose an in-context imitation learning framework that explicitly leverage semantic correspondence for generalizable robotic manipulation
Abstract: In-context imitation learning enables few-shot task generalization by conditioning policies on demonstrations, but existing methods often fail on unseen objects or novel scenarios. We introduce MatchingPolicy, a correspondence-aware framework that explicitly decouples demonstration–scene matching from policy learning. At its core, MatchingPolicy employs a graph-based diffusion policy that adapts robot actions based on dense correspondences extracted by vision foundation models. This separation alleviates the burden of simultaneous correspondence inference and action adaptation, enabling robust transfer across diverse tasks. Our approach further integrates an online adaptive matching algorithm to dynamically establish reliable correspondences during execution. Empirical results on both RLBench and real-world manipulation tasks show that MatchingPolicy achieves strong few-shot performance, demonstrating consistent generalization across unseen object instances and categories.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 2902
Loading