Full-Season Agent Evaluation in Soybean Farm Operations under Real-World Agricultural Process Dynamics

Published: 23 May 2026, Last Modified: 30 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agents in the Wild, Long-Horizon Evaluation, Agricultural AI, Tool Use, Process Models
Abstract: In this preliminary field study, we evaluate contemporary agent systems on the operational pipeline of an industry-grade soybean research farm at HIT. The environment is not a bespoke benchmark wrapper around an LLM tool interface; it is built from the physics-grounded process models, sensing infrastructure, machinery workflows, and historical operation records used to manage and study the farm. We evaluate nine representative agent-controller methodologies across three levels of farm operations, ranging from atomic tasks and episode chains to full-season scenarios that span the entire harvest cycle. Our results show that human-expert context is the dominant factor in long-horizon performance: without it, full-season agents remain 34% below the human-oracle soybean yield, while expert context reduces this shortfall to 7%. We observe that full-path agentic correctness and crop yield drop under longer horizons, and that agriculture-domain LLM-as-an-Expert guidance can recover yield while still lagging human experts. With these field observations from an operating research farm, we hope to inform AI research in domains where actions interact with physical processes, delayed feedback, and production-level outcomes.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 310
Loading