GeoSCN: A Novel multimodal self-attention to integrate geometric information on spatial-channel network for fine-grained image captioning
Abstract: Highlights•Enhances understanding of spatial relations and geometric details in image captioning.•Integrates geometric features with visual representations through proper alignment.•Employs spatial-channel mechanisms with advanced low-rank Luong scoring function.•Achieves competitive results on the MSCOCO benchmark, particularly in multi-object cases.•Provides features via Mendeley Data “GF-FRCNN MSCOCO” and codes for reproducibility.
External IDs:doi:10.1016/j.eswa.2025.126692
Loading