Keywords: Retrieval, Imiation Learning
Abstract: Imitation learning (IL) algorithms typically distill experience into parametric behavior policies to mimic expert demonstrations. With a limited set of demonstrations, previous methods often cannot accurately align the current state with expert demonstrations, especially under partial observability. We introduce a few-shot IL approach, \textbf{ReMoBot}, which directly \textbf{Re}trieves information from demonstrations to solve \textbf{Mo}bile manipulation tasks with ego-centric visual observations. Given the current observation, ReMoBot utilizes vision foundation models to identify a sub-goal, considering visual similarity w.r.t.\ both single observations and trajectories. A motion generation policy subsequently guides the robot toward each selected sub-goal, iteratively progressing until the task is successfully completed. We design three mobile manipulation tasks and evaluate ReMoBot on these tasks with a Boston Dynamics Spot robot. With only 20 demonstrations, ReMoBot outperforms baseline methods, achieving high success rates in Table Uncover (70\%) and Gap Cover (80\% ) tasks, while showing promising performance on the more challenging Curtain Open task (35\%). Moreover, ReMoBot generalizes to varying robot positions, object size, and material type. Additional details are available at: https://sites.google.com/view/remobot/home.
Submission Number: 2
Loading