OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: Embodied Agent, Embodied Reasoning
Abstract: Large language models excel at abstract reasoning, but their embodied agent reasoning capacity remains under explored. We present OmniEAR, a comprehensive framework for evaluating LLM reasoning about physical interactions, tool usage, and multi-agent coordination. Unlike existing benchmarks with predefined tools and explicit collaboration directives, OmniEAR requires agents to dynamically acquire capabilities and autonomously determine coordination strategies. Our benchmark models continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household, industrial, and diverse professional domains. Our evaluation reveals severe degradation when reasoning must emerge from physical constraints: performance drops from 85-96% with explicit instructions to below 50% on compound tasks. Surprisingly, complete environmental information degrades coordination performance, indicating models cannot filter task-relevant constraints. Fine-tuning dramatically improves single-agent performance, but fails to transfer to multi-agent scenarios, exposing fundamental architectural limitations. These findings demonstrate that embodied reasoning poses fundamentally different challenges from what current architectures can address, establishing OmniEAR as a rigorous benchmark for advancing embodied AI. Code and data are provided in the supplementary materials and will be publicly released.
Presenter: ~Zixuan_Wang24
Format: No, the presenting author is unable to, or unlikely to be able to, attend in person.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 36
Loading