Theory of Space: A Benchmark for Evaluating Spatial Belief Construction through Active Exploration

Pingyue Zhang; Zihan Huang; Yue Wang; Jieyu Zhang; Letian Xue; Zihan Wang; Qineng Wang; Keshigeyan Chandrasegaran; Ruohan Zhang; Yejin Choi; Ranjay Krishna; Jiajun Wu; Li Fei-Fei; Manling Li

Theory of Space: A Benchmark for Evaluating Spatial Belief Construction through Active Exploration

Pingyue Zhang, Zihan Huang, Yue Wang, Jieyu Zhang, Letian Xue, Zihan Wang, Qineng Wang, Keshigeyan Chandrasegaran, Ruohan Zhang, Yejin Choi, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Manling Li

Published: 29 Apr 2026, Last Modified: 11 May 2026Eval Eval @ ACL 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Large Language Mode, Vision-Language Model, Spatial Reasoning, Spatial Agent, Active Exploration

Abstract: Spatial embodied intelligence under partial observability requires agents to actively acquire missing information rather than passively consume complete observations. While multimodal foundation models excel at passive perception and reasoning, their ability to support self-directed exploration for building and maintaining coherent spatial beliefs remains understudied. We propose Theory of Space, defined as an agent’s ability to construct, revise, and exploit a spatial belief through active exploration under partial observability. We implement Theory of Space as a benchmark in textual and visual environments, where the goal is curiosity-driven exploration to build a complete and accurate spatial belief. A key innovation is spatial belief probing, which prompts agents to externalize their internal spatial belief as a cognitive map at each step, enabling direct measurement of belief quality. Evaluating state-of-the-art models on downstream tasks reveals three bottlenecks: (1) the \textbf{Active-Passive Gap}, where performance drops when agents must autonomously gather information (e.g., \textsc{GPT-5.2}: $57.1{\to}46.0$); (2) \textbf{Inefficiency}, with redundant and unsystematic exploration; and (3) unstable global beliefs, where spatial knowledge degrades over time. A false-belief paradigm further reveals \textbf{Belief Inertia}, especially severe in vision-based models.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Type: Research Paper

Archival Status: Non-archival

Submission Number: 20

Loading