GeoSCN: A Novel multimodal self-attention to integrate geometric information on spatial-channel network for fine-grained image captioning

Md. Shamim Hossain, Shamima Aktar, Naijie Gu, Weiyong Liu, Zhangjin Huang

Published: 01 May 2025, Last Modified: 12 Nov 2025Expert Systems with ApplicationsEveryoneRevisionsCC BY-SA 4.0

Abstract: Highlights•Enhances understanding of spatial relations and geometric details in image captioning.•Integrates geometric features with visual representations through proper alignment.•Employs spatial-channel mechanisms with advanced low-rank Luong scoring function.•Achieves competitive results on the MSCOCO benchmark, particularly in multi-object cases.•Provides features via Mendeley Data “GF-FRCNN MSCOCO” and codes for reproducibility.

External IDs:doi:10.1016/j.eswa.2025.126692