Representation Mismatch in Remote Sensing Foundation Models

Published: 01 Mar 2026, Last Modified: 05 Apr 2026ML4RS @ ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0
Abstract: Geospatial foundation models pretrained on large collections of satellite imagery achieve strong average performance across remote sensing tasks, but they implicitly assume stationarity across space and time. This assumption is routinely violated by seasonal dynamics, long-term environmental change, and abrupt regime shifts such as urbanization or infrastructure development, leading to embeddings that can align with acquisition artifacts rather than physically meaningful change. We study representation mismatch in remote sensing foundation models under non-stationarity and argue that the issue lies not only in model scale, but in how representations are constructed and normalized. We introduce a regime-aware representation framework that treats remote sensing imagery as physical measurement data, using spectral and spatial feature distributions normalized against local baselines and augmented with a temporal divergence signal. Through controlled empirical diagnostics, we show that scale-first embeddings can be sensitive to nuisance radiometric variation and unstable during regime transitions, while physically grounded, locally normalized representations exhibit improved coherence within regimes and clearer signals under change. These results highlight the importance of regime-aware and physically grounded design principles for foundation models applied to Earth system data.
Submission Number: 12
Loading