TL;DR: We find current multimodal language models (MLLMs) lack core knowledge which's acquired by humans through cognitive development in a young age and we reveal the dependency map of core cognitive concepts in MLLMs.
Abstract: While Multi-modal Large Language Models (MLLMs) demonstrate impressive abilities over high-level perception and reasoning, their robustness in the wild remains limited, often falling short on tasks that are intuitive and effortless for humans. We examine the hypothesis that these deficiencies stem from the absence of core knowledge—rudimentary cognitive abilities innate to humans from early childhood.
To explore the core knowledge representation in MLLMs, we introduce CoreCognition, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science.
We evaluate 230 models with 11 different prompts, leading to a total of 2,530 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
Finally, we propose Concept Hacking, a novel controlled evaluation method that reveals MLLMs fail to progress toward genuine core knowledge understanding, but instead rely on shortcut learning as they scale.
Lay Summary: Despite their impressive performance on perception and reasoning, today’s large AI models still struggle with basic concepts that human infants grasp early in life, such as understanding objects, space, or cause and effect. We wanted to know why.
To explore this gap, we built CoreCognition, a test suite inspired by childhood cognitive science, covering 12 fundamental knowledge areas. We then evaluated hundreds of modern AI systems—230 models in total—across more than 1,500 tasks using various prompts. Our findings were revealing: while AIs improve at higher-level tasks, they consistently lag on basic, intuitive ones, and often don’t improve further when scaled up.
Finally, we introduce a method called Concept Hacking. This approach digs into how AIs learn, showing that instead of genuinely understanding core ideas, they often rely on superficial shortcuts—learning patterns instead of meaning.
Core message: Despite impressive gains, AI still lacks basic common-sense understanding that humans acquire effortlessly in early childhood. Recognizing and testing this gap helps guide future efforts to build models with deeper, more human-level comprehension.
Link To Code: https://github.com/williamium3000/core-knowledge
Primary Area: Applications->Neuroscience, Cognitive Science
Keywords: core knowledge, multi-modal large language model, robustness
Submission Number: 6071
Loading