Anonymized-Bench: From Performance to Capability, Rethinking Evaluation in Geospatial AI

04 Mar 2026 (modified: 13 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Geospatial Foundation Models (GeoFMs) are transforming Earth Observation (EO), but evaluation lacks standardized protocols. Anonymized-Bench addresses this with a com- prehensive framework spanning classification, segmentation, regression, object detection, and instance segmentation across 19 permissively-licensed datasets. We introduce capabil- ity groups to rank models on datasets that share common characteristics (e.g., resolution, spectral bands, temporality), enabling users to identify which models excel in each capa- bility and to determine where future work should focus. To support both fair comparison and methodological innovation, we define a prescriptive yet flexible evaluation protocol. This ensures consistency in benchmarking while facilitating research into model adapta- tion strategies—a key open challenge in advancing GeoFMs for downstream tasks. Our experiments show that no single model dominates across all tasks, confirming the specificity of choices made during architecture design and pretraining. While models pretrained on natural images (ConvNext-ImageNet, DINOv3) excel on high-resolution tasks, EO-specific models (TerraMind, Prithvi, and Clay) outperform them on multispectral applications such as agriculture and disaster response. These findings demonstrate that optimal model choice depends on task requirements, data modalities, and operational constraints, and that the goal of a single GeoFM that performs well across all tasks remains open for future research. Anonymized-Bench enables informed, reproducible GeoFM evaluation tailored to specific use cases. Code, data, and the leaderboard are publicly released under a permissive license.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Frederic_Sala1
Submission Number: 7763
Loading