Abstract: In recent years, large pre-trained models, commonly referred to as foundation models, have become increasingly popular for various tasks leveraging transfer learning. This trend has expanded to remote sensing, where transformer-based foundation models such as Prithvi, msGFM, and SatSwinMAE have been utilized for a range of applications. While these transformer-based models, particularly the Prithvi model, exhibit strong generalization capabilities, they have limitations on capturing fine-grained details compared to convolutional neural network architectures like U-Net in segmentation tasks. In this paper, we propose a novel architecture, U-Prithvi, which combines the strengths of the Prithvi transformer with those of U-Net. We introduce a RandomHalfMaskLayer to ensure balanced learning from both models during training. Our approach is evaluated on the Sen1Floods11 dataset for flood inundation mapping, and experimental results demonstrate better performance of U-Prithvi over both individual models, achieving improved performance on out-of-sample data. While this principle is illustrated using the Prithvi model, it is easily adaptable to other foundation models.
External IDs:dblp:conf/giscience/KostejnEAV25
Loading