CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension

CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension

ACL ARR 2026 January Submission2550 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cognitive Science and AI, Benchmarking, Semantic Change, Metaphor and Metonymy

Abstract: Human cognition excels at extending knowledge through analogy, where word meanings evolve along structured pathways from concrete prototypes to abstract senses via metaphor and metonymy. Do Large Language Models (LLMs) internalize this generative logic, or merely mimic statistical patterns? To investigate this, we introduce CogEvolve, a cognitive linguistic benchmark designed to test these evolutionary pathways across textual and visual modalities. Our evaluation reveals a distinct cognitive profile: models function as "Super-Associators" expert at static recognition yet fail at causal reasoning. In text, they exhibit a Frequency-Primacy Conflation, confusing statistical prevalence with cognitive basicness. Crucially, this reasoning collapses further in the visual domain. We term this deficit the Ungrounded Arrow: models possess high-fidelity concept representations (the "dots") but lack the transformational operators (the "arrows") essential for true relational understanding.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: linguistic theories，cognitive modeling，multimodality，benchmarking，metaphor

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 2550

Loading