Abstract: Highlights•We construct a visual-based multi-source cross-modal retrieval network that manages to unify RS retrieval tasks under multiple retrieval sources.•To address semantic heterogeneity among multiple data sources, we propose a shared pattern transfer module based on pattern memorizers and combine the theory of generative adversarial to achieve the semantic representation unbound from modality.•To cope with the lack of annotation data in the RS scene, we construct an unified unimodal self-supervised pre-training method, and align semantics under different modalities through the constructed multi-modal RS dataset.
Loading