Towards LLM Agents for Earth Observation

Towards LLM Agents for Earth Observation

ACL ARR 2026 January Submission463 Authors

23 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Earth Observation, AI agent, code generation

Abstract: Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. In this work we ask: Are AI systems ready for reliable Earth Observation? To answer this, we introduce **UnivEARTH**, a coding benchmark of 408 yes/no questions from NASA Earth Observatory articles across 7 various topics and over 15 satellite instruments and sources. Using Google Earth Engine API as a tool in a zero-shot setup, LLM agents achieve an accuracy of 40.0\% where the code fails to run over 44\% of the time. To better understand LLM agent behavior, we also analyze the impact of using the JavaScript API versus Python and the effect of providing documentation. Furthermore, we find that using a reflexion framework significantly reduces errors: Claude-4.5-Sonnet, Gemini-2.5-Pro, and GPT-5 accuracies rise to around 60\%. However, these results remain only marginally above random chance. Taken together, our findings identify significant challenges to be solved before AI agents can automate earth observation, and suggest paths forward.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Code generation benchmark, Earth Observation, AI4Science

Contribution Types: Data resources

Languages Studied: English

Submission Number: 463

Loading