Benchmarking Tabular Foundation Models for Agricultural Yield Prediction

Published: 09 Dec 2025, Last Modified: 25 Jan 2026AgriAI 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tabular Foundation Models, Agricultural Yield Prediction, TabPFN, Ensemble Methods, Machine Learning
TL;DR: We benchmark TabPFNv2 against AutoGluon and PyCaret across three agricultural datasets and show that foundation models excel with missing or limited data, while AutoML dominates large clean datasets.
Abstract: Accurate crop yield prediction is crucial for global food security and agricultural planning. This study benchmarks modern tabular foundation models and automated machine learning frameworks across three diverse agricultural datasets: (1) soybean yields with 86,101 temporal sequences, (2) global multi-crop data with 28,242 samples across 101 countries, and (3) EU-27 regional crops with 8,656 samples and significant missing data. We evaluate TabPFNv2 (an improved implementation of the TabPFN architecture), AutoGluon, and PyCaret to determine which approach works best under different data conditions. Our results show that model performance is highly context-dependent. AutoGluon performs best on large-scale complete data, PyCaret performs well on diverse multi-crop scenarios, while TabPFNv2 demonstrates distinct advantages on datasets with missing values (about a two percentage point gain in $R^2$ on EU-27). These findings show that none of the tested methods are universally superior. Furthermore, foundation models provide robust zero-shot predictions, particularly while handling incomplete data, which is essential for practical agricultural AI deployment.
Submission Number: 29
Loading