CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting

29 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark dataset; crop yield forecasts; agriculture; climate change; food security
TL;DR: CY-Bench is a comprehensive benchmark dataset for data-driven approaches to in-season crop yield forecasting.
Abstract: In-season or pre-harvest crop yield forecasts are essential for enhancing transparency in commodity markets and planning towards achieving the United Nations’ Sustainable Development Goal 2 of zero hunger, especially in the context of climate change and extreme events leading to crop failures. Pre-harvest crop yield forecasting is a difficult problem, as several interacting factors contribute to yield formation, including in-season weather variability, extreme events, long-term climate change, pests, diseases and farmer management decisions. Machine learning methods provide ways to capture complex interactions among such predictors and crop yields. Prior research in agricultural applications, including crop yield forecasting, has primarily been case-study based, which makes it difficult to compare modeling approaches and measure progress. To address this gap, we introduce CY-Bench (Crop Yield Benchmark), a comprehensive dataset and benchmark to forecast crop yields. We standardized data source selection, preprocessing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors such as weather, soil, and remote sensing indicators, in collaboration with domain experts such as agronomists, climate scientists, and machine learning researchers. With CY-Bench we aim to: (i) standardize machine learning model evaluation in a framework that covers multiple farming systems in more than twenty five countries across the globe, (ii) facilitate robust and reproducible model comparison through a benchmark addressing real-world operational needs, (iii) share a dataset with the machine learning community to facilitate research efforts related to time series forecasting, domain adaptation and online learning. The dataset and code used will be openly available, supporting the further development of advanced machine learning models for crop yield forecasting that can be used to aid decision-makers in improving global and regional food security.
Supplementary Material: pdf
Submission Number: 2066
Loading