Re-calibrating Progress: A Physics-Aware Benchmark to Expose the Evaluation Gap in Scientific Machine Learning

Re-calibrating Progress: A Physics-Aware Benchmark to Expose the Evaluation Gap in Scientific Machine Learning

ICLR 2026 Conference Submission16149 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hyperspectral Pansharpening

Abstract: Progress in scientific machine learning is critically hindered by a pervasive "evaluation gap", where models that excel on legacy benchmarks fail in real-world deployment due to a reliance on idealized synthetic data and fragile proxy metrics. We argue that the path forward requires a new paradigm of physics-aware benchmarking, which we instantiate with $\text{PRISMABench}$ for the challenging inverse problem of hyperspectral pansharpening. Our ecosystem introduces three core contributions: a $\textbf{physics-enriched dataset}$ that packages real satellite PRISMA hyperspectral (HS) and panchromatic (PAN) pairs by their real physical sensors with 10 challenge scenes; an extended $\textbf{PAN-centric evaluation metric}$, including a novel physics-consistency score for robust, no-reference assessment; and $\textbf{insightful visualization tools}$, such as multi-metric radar charts, to move beyond single-score leaderboards and expose performance trade-offs. Using this framework, we reveal a critical disconnect: a model's rank on traditional reduced resolution benchmarks is a limited predictor of its real-world performance. By open-sourcing our ecosystem, we provide a blueprint for creating benchmarks that challenge the community to move beyond optimizing flawed proxies and towards developing models that are demonstrably robust and physically plausible.

Primary Area: datasets and benchmarks

Submission Number: 16149

Loading