SAPO: Safety-Aware Embodied Task Planning with fully Partially-Observable environment and physical constraints

Published: 23 Sept 2025, Last Modified: 22 Nov 2025LAWEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Embodied Task Planning, Embodied AI, Safety-Aware Embodied Task Planning, LLM Agent
Abstract: Embodied Task Planning (ETP) with LLMs faces critical safety challenges in real-world settings, where partial observability and physical constraints must be upheld. Existing benchmarks often neglect these factors, limiting assessment of both feasibility and safety. We present SAPO, a benchmark for safety-aware ETP that integrates strict partial observability, physical constraints, step-by-step reasoning, and goal-conditioned evaluation. Covering diverse household hazards, SAPO enables rigorous assessment through state- and constraint-based online metrics. Experiments show that current LLMs perform poorly—collapsing on tasks involving implicit safety constraints. Even strong models like o4-mini achieve only 28\% success under explicit constraints. These results highlight that LLMs remain insufficient for safe ETP and underscore the need for agentic alignment and commonsense integration to ensure reliable, safety-aware physical interaction.
Submission Type: Benchmark Paper (4-9 Pages)
Submission Number: 30
Loading