RT-Affordance: Reasoning about Robotic Manipulation with Affordances

Published: 29 Oct 2024, Last Modified: 03 Nov 2024CoRL 2024 Workshop MRM-D PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Manipulation, VLMs, Affordances
TL;DR: RT-Affordance leverages affordances as an intermediate interface to effectively bridge robot, affordance, and web data, demonstrating superior generalization abilities.
Abstract: We explore how policy input interfaces can facilitate generalization by providing intermediate guidance on how to perform manipulation tasks. Existing interfaces such as language, goal-image, and trajectory sketches have been shown to be helpful, but these representations either do not provide enough context or provide over-specified context that yields less robust policies. We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task. Affordances offer expressive yet lightweight abstractions, are easy for users to specify, and facilitate efficient learning by transferring knowledge from large internet datasets. Our method, RT-Affordance is a hierarchical model that first proposes an affordance plan given the task language, and then conditions the policy on this affordance plan to perform manipulation. Our affordance model can flexibly bridge diverse sources of supervision, including large web datasets, robot trajectories, and cheap-to-collect in-domain datasets, allowing us to learn new tasks with minimal effort. We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%, and we empirically demonstrate that affordances are robust to novel settings.
Submission Number: 45
Loading