Keywords: Visual saliency, Word order, Grammatical role, Animacy, AI-Image-Editing model, AI and Human captions
TL;DR: We investigate the impact of several types of saliency features (perceptual and relational) on linguistic features (entity mention, order of mention, and grammatical role) in human and AI captions.
Abstract: How does our perception of the world influence the way we talk about it? Psycholinguistic studies have investigated whether visual salience correlates with entity mention and ordering, but often disregarded its effect on grammar or relied on simplistic images or artificial cues.
In this study, we explore the use of generative AI to better control for salience in visual stimuli while keeping them realistic, and to serve as a proxy for human participants in studying how different types of salience impact image descriptions.
We consider three salience types: *perceptual* (e.g. relative size in the image), *inherent* (e.g. animacy), and *relational* (e.g. human–object interaction). We first analyze human- and AI-generated captions for natural images to examine how salience correlates with how early, and in what grammatical role, an entity is mentioned. We find strong correlations between models and humans in this observational study, justifying the use of AI models alone in a further causal study. For this second study, we created datasets composed of pairs of images, where we used an image-editing model to intervene on the salience of a target entity.
We show that relational and perceptual salience lead to the entity being mentioned earlier in captions and being mapped to more prominent grammatical roles. The magnitude of this effect varies across entity types, with animate entities (high inherent salience) showing a particularly distinct pattern.
Scope Confirmation: To the best of my judgment, this submission falls within the scope of CoNLL.
Primary Area Selection: Multimodality and Grounding
Secondary Area Selection: Computational Psycholinguistics, Cognition and Linguistics
Use Of Generative Artificial Intelligence Tools: Yes, other (specify below)
Other Use Of Generative Artificial Intelligence Tools: code debug
Data Collection From Human Subjects: No
Submission Type: Archival: I certify that the submission has not been previously published, nor is the material in it under review by another journal or conference. Further, no material in it will be submitted for review at another conference or journal while under review by CoNLL 2026.
Submission Number: 154
Loading