Abstract: Highlights•A dataset was built to explore relationships between human vision and language.•We propose an eye-movement-prompted large image captioning model in this paper.•A GNN-based module is designed to explore useful features of eye-movement data.•The effectiveness and interpretability of EMLIC were verified on two datasets.
Loading