AffoGato: Learning Open-Vocabulary Affordance Grounding with Foundation Models

19 Sept 2025 (modified: 16 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Affordance, Open-Vocabulary, Vision-Language Models, Foundation Models
TL;DR: We introduce AffoGato, an open-vocabulary affordance grounding framework with three stages: automatic generation of Affo-150K, pretraining Gato-3D/2D models on this data, and fine-tuning that demonstrates strong open-vocabulary capabilities.
Abstract: .
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 14674
Loading