Abstract: Foundation models (FMs) pre-trained on large-scale satellite imagery promise transformative advances in remote sensing (RS). However, their real-world deployment often exposes failure patterns invisible in standard benchmarks. We introduce a failure taxonomy for RS FMs, validated through experiments on three models (SatMAE, Prithvi, SeCo) across five stress tests in Brazilian biomes. Results show performance drops of 12.4-34.2% under operational conditions, with cloud occlusion causing the most severe degradation (26.4% average drop). Confidence miscalibration increases significantly (ECE up to 0.28), highlighting over-confidence during failures. Our taxonomy provides a practical diagnostic framework for deployment testing, bridging the gap between research and reliable application.
Submission Number: 2
Loading