RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

Published: 05 Sept 2024, Last Modified: 13 Oct 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hierarchical Retrieval, Affordance Transfer, Zero-Shot Robotic Manipulation, Visual Foundation Models
TL;DR: This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments.
Abstract: This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundant out-of-domain data. RAM first extracts unified affordance at scale from diverse sources of demonstrations including robotic data, human-object interaction (HOI) data, and custom data to construct a comprehensive affordance memory. Then given a language instruction, RAM hierarchically retrieves the most similar demonstration from the affordance memory and transfers such out-of-domain 2D affordance to in-domain 3D actionable affordance in a zero-shot and embodiment-agnostic manner. Extensive simulation and real-world evaluations demonstrate that our RAM consistently outperforms existing works in diverse daily tasks. Additionally, RAM shows significant potential for downstream applications such as automatic and efficient data collection, one-shot visual imitation, and LLM/VLM-integrated long-horizon manipulation.
Supplementary Material: zip
Spotlight Video: mp4
Video: https://yuxuank.com/RAM/assets/video/ram_supp_woID_compressed.mp4
Website: https://yuxuank.com/RAM/
Code: https://github.com/yxKryptonite/RAM_code
Publication Agreement: pdf
Student Paper: yes
Submission Number: 459
Loading