Can Generative Multimodal Models Count to Ten?

Published: 02 Mar 2024, Last Modified: 02 Mar 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: short paper (up to 5 pages)
Keywords: counting, number, foundation models, cognitive science
TL;DR: We use the Give-N task from developmental psychology to analyze the Parti generative multimodal model's counting ability
Abstract: We adapt a developmental psychology paradigm to characterize the counting ability of the foundation model Parti. We show that three model scales of the Parti model (350m, 3B, and 20B parameters respectively) each have some counting ability, with a significant jump in performance between the 350m and 3B model scales. We also demonstrate that it is possible to interfere with these models' counting ability simply by incorporating unusual descriptive adjectives for the objects being counted into the text prompt. We analyze our results in the context of the knower-level theory of child number learning. Our results show that we can gain experimental intuition for how to probe model behavior by drawing from a rich literature of behavioral experiments on humans, and, perhaps most importantly, by adapting human developmental benchmarking paradigms to AI models, we can characterize and understand their behavior with respect to our own.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 18
Loading