Bongard-Tool: Tool Concept Induction from Few-Shot Visual ExemplarsDownload PDF

Published: 23 Jan 2023, Last Modified: 05 May 2023PKU CoRe 22Fall OralReaders: Everyone
Abstract: There is no one-to-one mapping from objects to tool concepts. In fact, objects can support diversified tool uses, enabling compositional and flexible functionalities. These tool-like functionalities are mostly context-dependent and vary across scenes. To address this unique property of tool concepts, we propose the Bongard-Tool challenge and formulate the context-dependent tool understanding as a few-shot concept induction problem. Specifically, to build Bongard-Tool, we employ large language models for knowledge building, web crawling, and vision-language models for content retrieval and filtering. We also perform extensive experiments on recent few-shot and meta-learning methods to show the hardship of understanding compositional tool concepts from pure visual perception. We hope to shed light on future studies by introducing Bongard-Tool benchmark as a testbed for building machines that can flexibly understand and use tools.
Supplementary Material: zip
1 Reply

Loading