Evaluating Multi-modal Language Models Through Concept Hacking

Yijiang Li; Bingyang Wang; Tianwei Zhao; Qingying Gao; Hokin Deng; Dezhi Luo

Evaluating Multi-modal Language Models Through Concept Hacking

Yijiang Li, Bingyang Wang, Tianwei Zhao, Qingying Gao, Hokin Deng, Dezhi Luo

Published: 06 Mar 2025, Last Modified: 01 May 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: regular paper (up to 6 pages)

Keywords: MLLM, shortcut-taking, Moravec's Paradox, core knowledge, grounding cognition

Abstract: Evaluating the cognitive abilities of Multi-modal Language Models (MLLMs) is challenging due to their reliance on spurious correlations. To distinguish shortcut-taking from genuine reasoning, we introduce Concept Hacking, a paradigm manipulating concept-relevant information to flip the ground-truth but preserving concept-irrelevant confounds. For instance, in a perceptual constancy test, models must recognize that a uniformly wide bridge does not narrow in the distance; the manipulated condition using concept hacking altered the bridge to actually taper. We assessed 209 models across 45 experiment pairs spanning nine low-level cognitive abilities, encompassing all five core knowledge domains. Comparing performance on manipulated versus standard conditions revealed that models fell into shortcut-reliant or illusory understanding types, with none approaching human-level performance. Models of varying sizes appear in each category, indicating that scaling neither imparts core knowledge nor reduces shortcut reliance. These findings highlight fundamental limitations in current MLLMs, reinforcing concerns about their ability to achieve genuine understanding.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 44

Loading