Causal-GeoSim: A Geospatial Causal Robustness Benchmark for Auditing LLMs in Agriculture

Published: 09 Dec 2025, Last Modified: 25 Jan 2026AgriAI 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: causal reasoning, geospatial AI, agriculture, large language models, robustness, climate impact
TL;DR: A reproducible benchmark that evaluates large language models on causal direction, spatial coherence, and geo-risk alignment in climate-agriculture reasoning.
Abstract: Causal-GeoSim automatically fuses county-level corn yield (USDA NASS), climate reanalysis (ERA5-Land), and U.S. Census geometries to generate paired causal and anti-causal prompts per county-year. Models must answer in a strict “A/B” format, enabling clear evaluation of causal direction understanding. We introduce three metrics: (1) CAI (Causal Advantage Index) for directional correctness, (2) Geo-CAI for spatial consistency via Moran’s I, and (3) GRS (Geo-Risk Score) penalizing causal-direction errors in yield-critical regions. Experiments across eight contemporary LLMs (GPT-4o, Claude-3.5-Sonnet, Gemini-2.5-Pro, Llama-3.1-70B, etc.) show that frontier and strong open-weight models achieve nearly perfect causal symmetry and spatial coherence, while smaller or less domain-aligned models display localized brittleness and elevated geo-risk. Causal-GeoSim offers a transparent, single-notebook pipeline for region-aware, risk-sensitive evaluation of LLMs in agriculture.
Submission Number: 11
Loading