BloomBench: A Multi-Species Benchmark for Evaluating the Generalization of Fruit Tree Phenology Models

Published: 09 Dec 2025, Last Modified: 25 Jan 2026AgriAI 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark, dataset, phenology, agro-ecology
TL;DR: We introduce a benchmark for evaluating machine learning models that predict tree phenology based on meteorological drivers.
Abstract: The timing of phenological events in trees is incredibly important for understanding a wide range of secondary effects, such as the susceptibility of orchard yields to environmental stressors and the phenological timing of adjacent ecosystems. Tree phenology is strongly driven by temperature, and (agro-)ecologists typically use biophysical thermal-time models to relate changes in temperature to the timing of observed events. Mechanistic models, however, show large discrepancies since these dynamics are difficult to capture in simple equations. With the improved quality and quantity of data on plant phenology, this has popularized the use of machine learning methods for this purpose. Existing works, however, are evaluated for different species and specific regions, making inter-comparisons challenging. We provide the first benchmark covering different species, cultivars and climates for evaluating models that predict the timing of crop phenophases. We have compiled a consistent set of datasets linking climatic drivers with the timing of flowering in fruit trees. With this benchmark we (i) provide consistent model evaluation on datasets with different characteristics (e.g. size, cultivar information, observation trends, climate gradient) thus highlighting model strengths and weaknesses, (ii) provide a real multi-faceted use case for evaluating machine learning methods that focus on different types of domain shifts, (iii) accelerate ML research in this domain by facilitating a publicly available, ready-to-use dataset.
Submission Number: 9
Loading