Abstract: Highlights•We propose a cross-modal mapping network approach based on text and image features.•We construct a cross-correlation graph using external knowledge of ANPS as a bridge.•We design a GCN architecture with a retrieval-based attention mechanism.•Experiments show that our model has significantly better performance.
Loading