Keywords: Hallucination detection, LLM-as-a-judge, rubrics
TL;DR: We use agents as orchestrators, planners, and judges to improve embodied agent navigation
Abstract: Symbolic interfaces for embodied agents allow large language models (LLMs) to plan
in terms of goalssubgoalsactionsand transition rules. Howevereven with complete and
accurate environment contextLLM planners frequently hallucinate entities or effects that
are incompatible with the environmentundermining reliability.
We present an inference-time agent-as-judge interface based on a rubric-guided planner–
judge architecture for the Transition Modeling (TM) module of the Embodied Agent
Interface (EAI) benchmark [8 ]. A planner agent proposes multiple candidate transition
ruleswhile a distinct judge agent assigns structured rubric scores that explicitly capture
hallucination-related criteria and environment consistency.
Our orchestrator uses these rubric scores to filter hallucinated candidates and select
a final outputwithout any additional model training. Instantiated for VIRTUALHOME
Transition Modeling [10] and evaluated on the local EAI VIRTUALHOME validation
splita 4-sample planner–judge variant improves TM accuracy and reduces obvious
hallucinations relative to a single-sample planner baseline.
Submission Number: 15
Loading