SAFE: Benchmarking AI Weather Prediction Fairness with Stratified Assessments of Forecasts over Earth

SAFE: Benchmarking AI Weather Prediction Fairness with Stratified Assessments of Forecasts over Earth

ICLR 2026 Conference Submission24827 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: faireness, weather, climate, artificial intelligence, machine learning

TL;DR: AI weather prediction models exhibit biases in forecast performance based on geographic region, income, landcover, and lead time.

Abstract: The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. However, this approach fails to account for the non-uniform patterns of human development and geography that exist across Earth. We introduce Stratified Assessments of Forecasts over Earth (SAFE), a package for elucidating the stratified performance of a set of predictions made over Earth. SAFE integrates various domains of data to perform stratification on different attributes associated with gridpoints: territory (usually country), global subregion, income, and landcover (land or water). This allows us to examine the performance of models for each individual stratum of the different attributes (e.g., the accuracy in every individual country). To demonstrate its importance, we utilize SAFE to benchmark a zoo of state-of-the-art AI-based weather prediction models, finding that they all exhibit disparities in forecasting skill across every attribute. We use this to seed a benchmark of model forecast fairness through stratification at different lead times for various climatic variables. By moving beyond globally-averaged metrics, we for the first time ask: where do models perform best or worst, and which models are most fair? To support further work in this direction, the SAFE package is made available at https://anonymous.4open.science/r/safe-E7C7.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 24827

Loading