Abstract: This paper presents an innovative approach to pretraining models for remote sensing by integrating optical and SAR (Synthetic Aperture Radar) data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework, our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multitask design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally, we introduce a “mixing” strategy in the pretraining phase, combining patches from both image sources, which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks, including Sen1Floods11, BigEarthNet, and UrbanSRSeg8, demonstrates significant improvements in model performance and generalizability across diverse remote sensing applications.
External IDs:dblp:conf/wacv/LinialLBSGSB25
Loading