PhySwin: An Efficient and Physically-Informed Foundation Model for Multispectral Earth Observation

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundation Model, Physics-Informed ML, Efficient Machine Learning, Multispectral Image, Remote Sensing
Abstract: Recent progress on Remote Sensing Foundation Models (RSFMs) aims toward universal representations for Earth observation imagery. However, current efforts often scale up in size significantly without addressing efficiency constraints critical for real-world applications (e.g., onboard processing, rapid disaster response) or treat multispectral (MS) data as generic imagery, overlooking valuable physical priors. We introduce PhySwin, a foundation model for MS data that integrates physical priors with computational efficiency. PhySwin combines three innovations: (i) physics-informed pretraining objectives leveraging radiometric constraints to enhance feature learning; (ii) an efficient MixMAE formulation tailored to SwinV2 for low-FLOP, scalable pretraining; and (iii) token-efficient spectral embedding to retain spectral detail without increasing token counts. Pretrained on over 1M Sentinel-2 tiles, PhySwin achieves SOTA results (+1.32\% mIoU segmentation, +0.80\% F1 change detection) while reducing inference latency by up to 14.4$\times$ and computational complexity by up to 43.6$\times$ compared to ViT-based RSFMs.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 238
Loading