Keywords: abstractions, analogies, concept induction, ARC, ConceptARC, multimodal LLMs, GPT-4V
Abstract: The ability to form and use abstractions in a few-shot manner is a key aspect of human cognition; it is this capacity that enables us to understand and act appropriately in novel situations. In this paper we report on comparisons between humans and GPT-4V on visual tasks designed to systematically assess few-shot abstraction capabilities using core-knowledge concepts related to objectness, object motion, spatial configurations and relationships, and basic numerosity. We test the impact of presenting tasks to GPT-4V using visual, mixed text-visual, and text-only representations. Our findings highlight that GPT-4V, one of today's most advanced multimodal LLMs, still lacks the flexible intelligence possessed by humans to efficiently relate different situations through novel abstractions.
Submission Number: 35
Loading