Re-calibrating Progress: A Physics-Aware Benchmark to Expose the Evaluation Gap in Scientific Machine Learning
Keywords: Hyperspectral Pansharpening
Abstract: Progress in scientific machine learning is critically hindered by a pervasive "evaluation gap", where models that excel on legacy benchmarks fail in real-world deployment due to a reliance on idealized synthetic data and fragile proxy metrics. We argue that the path forward requires a new paradigm of physics-aware benchmarking, which we instantiate with $\text{PRISMABench}$ for the challenging inverse problem of hyperspectral pansharpening. Our ecosystem introduces three core contributions: a $\textbf{physics-enriched dataset}$ that packages real satellite PRISMA hyperspectral (HS) and panchromatic (PAN) pairs by their real physical sensors with 10 challenge scenes; an extended $\textbf{PAN-centric evaluation metric}$, including a novel physics-consistency score for robust, no-reference assessment; and $\textbf{insightful visualization tools}$, such as multi-metric radar charts, to move beyond single-score leaderboards and expose performance trade-offs. Using this framework, we reveal a critical disconnect: a model's rank on traditional reduced resolution benchmarks is a limited predictor of its real-world performance. By open-sourcing our ecosystem, we provide a blueprint for creating benchmarks that challenge the community to move beyond optimizing flawed proxies and towards developing models that are demonstrably robust and physically plausible.
Primary Area: datasets and benchmarks
Submission Number: 16149
Loading