Towards LLM Agents for Earth Observation

Published: 10 Jun 2025, Last Modified: 17 Jul 2025TerraBytes 2025 withoutproceedingsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Earth Observation, Benchmark
TL;DR: We introduce UnivEarth, a benchmark of 140 questions spanning 13 Earth Science topics and data from 17 remote sensing instruments.
Abstract: Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. Here we ask: *Are AI agents ready for reliable Earth Observation*? We introduce **UnivEARTH**, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time. Taken together, our findings identify significant challenges to be solved before AI agents can automate Earth observation and suggest paths forward.
Submission Number: 2
Loading