Multi-granularity semantic relational mapping for image caption

Nan Gao, Renyuan Yao, Peng Chen, Ronghua Liang, Guodao Sun, Jijun Tang

Published: 2025, Last Modified: 17 Apr 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We introduce a novel framework called Multi-granularity Semantic Relational Mapping (MSRM), which innovatively constructs multi- granularity semantic relational interactions between regions and grids. This framework enhances visual semantic relational representations, enabling the generation of captions that are not only rich in scene details but also accurately depict relationships within the scene.