SEORation: Curating SAR-EO Paired Data for Multi-Modal Remote Sensing Foundation Models

Published: 21 May 2026, Last Modified: 01 Jun 2026MONTI 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: SAR–EO paired data; multi-modal remote sensing; data curation; semantic deduplication; cross-modal retrieval; contrastive learning; foundation models
TL;DR: SEORation curates SAR–EO paired data via semantic deduplication and RSLIP score filtering, producing OpenSEP-1.7M and improving cross-modal retrieval over the raw pool.
Abstract: SAR-EO paired data provide complementary supervision for multi-modal remote sensing foundation models. However, simply aggregating SAR-EO pairs from multiple sources can introduce semantic redundancy and weakly aligned cross-modal pairs. To address these issues, we propose SEORation, a two-stage pipeline for curating SAR-EO pairs. SEORation first performs remote-sensing-aware semantic deduplication using RemoteCLIP embeddings, and then prioritizes pairs with stronger scene-level compatibility through our proposed RSLIP score filtering. In this work, we release OpenSEP (Open SAR-EO Pairs), a 4.9M-pair multi-source SAR-EO data pool, and OpenSEP-1.7M, a curated subset selected from this pool according to retrieval performance on the OpenSEP validation split. We also provide empirical validation of SEORation through curation and retrieval experiments, demonstrating that the proposed pipeline improves SAR-EO pair selection for multi-modal pretraining. On QXS-SAROPT, external evaluation further shows that the RSLIP model trained on OpenSEP-1.7M improves the aggregate retrieval score from 445.10 to 510.06 compared with the raw candidate pool. These results highlight the importance of paired-data curation for reliable cross-modal alignment in SAR-EO multi-modal pretraining.
Submission Number: 10
Loading