AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

ACL ARR 2026 January Submission10818 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic Planning, Space Planning Problems, Benchmarking

Abstract: Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks largely focus on symbolic or weakly grounded environments, leaving their performance in physics-constrained real-world domains underexplored. We introduce \emph{AstroReason-Bench}, a comprehensive benchmark for evaluating agentic planning in \emph{Space Planning Problems (SPP)}, a family of high-stakes problems with heterogeneous objectives, strict physical constraints, and long-horizon decision-making. AstroReason-Bench integrates multiple scheduling regimes, including ground station communication and agile Earth observation, and provides a unified agent-oriented interaction protocol. Evaluating on a range of state-of-the-art open- and closed-source agentic LLM systems, we find that current agents substantially underperform specialized solvers, highlighting key limitations of generalist planning under realistic constraints. AstroReason-Bench offers a challenging and diagnostic testbed for future agentic research.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: AI/LLM Agents

Contribution Types: Data resources

Languages Studied: English

Submission Number: 10818

Loading