Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Melanie Mitchell; Alessandro B. Palmarini; Arsenii Kirillovich Moskvichev

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Melanie Mitchell, Alessandro B. Palmarini, Arsenii Kirillovich Moskvichev

Published: 14 Dec 2023, Last Modified: 28 Jan 2024LLM-CP @ AAAI 2024 OralEveryoneRevisionsBibTeX

Keywords: Abstract reasoning; GPT-4; ConceptARC

TL;DR: We compare the abstract reasoning abilities of humans and of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark, and find that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.

Abstract: We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark (Moskvichev, Odouard, and Mitchell, 2023), which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.

Submission Number: 6

Loading