Abstract: Highlights•We introduce a novel framework called Multi-granularity Semantic Relational Mapping (MSRM), which innovatively constructs multi- granularity semantic relational interactions between regions and grids. This framework enhances visual semantic relational representations, enabling the generation of captions that are not only rich in scene details but also accurately depict relationships within the scene.
Loading