OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding

Published: 05 Apr 2024, Last Modified: 22 Apr 2024VLMNM 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, VLM, robotic manipulation, affordance
TL;DR: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding
Abstract: In order for robots to interact with objects effectively, they must understand the form and function of each object they encounter. Essentially, robots need to understand which actions each object affords, and where those affordances can be acted on. By leveraging a Vision Language Model (VLM) for open-vocabulary object part segmentation and a Large Language Model (LLM) to ground each part-segment-affordance, OVAL-Prompt demonstrates generalizability to novel object instances, categories, and affordances without domain-specific finetuning. Quantitative experiments demonstrate that without any finetuning, OVAL-Prompt achieves localization accuracy that is competitive with supervised baseline models. Moreover, qualitative experiments show that OVAL-Prompt enables affordance-based robot manipulation of open-vocabulary object instances and categories.
Supplementary Material: zip
Submission Number: 42
Loading