CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension

ACL ARR 2026 January Submission2550 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Cognitive Science and AI, Benchmarking, Semantic Change, Metaphor and Metonymy
Abstract: Human cognition excels at extending knowledge through analogy, where word meanings evolve along structured pathways from concrete prototypes to abstract senses via metaphor and metonymy. Do Large Language Models (LLMs) internalize this generative logic, or merely mimic statistical patterns? To investigate this, we introduce CogEvolve, a cognitive linguistic benchmark designed to test these evolutionary pathways across textual and visual modalities. Our evaluation reveals a distinct cognitive profile: models function as "Super-Associators" expert at static recognition yet fail at causal reasoning. In text, they exhibit a Frequency-Primacy Conflation, confusing statistical prevalence with cognitive basicness. Crucially, this reasoning collapses further in the visual domain. We term this deficit the Ungrounded Arrow: models possess high-fidelity concept representations (the "dots") but lack the transformational operators (the "arrows") essential for true relational understanding.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: linguistic theories,cognitive modeling,multimodality,benchmarking,metaphor
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 2550
Loading