More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

ACL ARR 2026 January Submission7609 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language Models, Idiomaticity, Visual De-Noising, Cognitive Semiotics, Semantic Alignment Gap.
Abstract: Vision-Language Models (VLMs) excel at photorealistic generation, yet often struggle to represent abstract meaning such as idiomatic interpretations of noun compounds. To study whether photorealistic detail interferes with symbolic grounding, we introduce DIVA, a controlled benchmark that replaces photorealistic noise with schematic iconicity by generating paired, sense-anchored visualizations for literal and idiomatic readings. We further propose Semantic Alignment Gap ($\Delta$), an architecture-agnostic metric that quantifies divergence between literal and idiomatic visual grounding. To enable cross-paradigm comparison between the gut feeling of latent embeddings and the deliberate thought of generative reasoning, we instantiate $\Delta$ via three access-dependent signals: (i) embedding geometry for discriminative encoders, (ii) \textit{Likelihood of Idiomatic Distinction} (LID) from token probabilities for open generative models, and (iii) behavioral confidence elicitation for proprietary systems. Evaluating 8 recent VLMs, we reveal a consistent Literal Superiority Bias: model scale alone does not resolve literal preference, and increased visual fidelity can coincide with weaker symbolic alignment, indicating cognitive interference from hyper-realistic imagery. Our findings suggest that improving compositional understanding requires de-noising visual input and anchoring interpretation and generation in intended meaning.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation; benchmarking;automatic evaluation of datasets; evaluation methodologies
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 7609
Loading