Multi-granularity semantic relational mapping for image caption

Published: 01 Jan 2025, Last Modified: 17 Apr 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We introduce a novel framework called Multi-granularity Semantic Relational Mapping (MSRM), which innovatively constructs multi- granularity semantic relational interactions between regions and grids. This framework enhances visual semantic relational representations, enabling the generation of captions that are not only rich in scene details but also accurately depict relationships within the scene.
Loading