PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: 3D Block Assembly, Physical Understanding, Task Planning, Multimodal Reasoning
TL;DR: We introduce PhyBlock, a progressive benchmark evaluating large vision-language models on physical understanding and spatial planning via robotic 3D block assembly tasks.
Abstract: While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block assembly tasks. PhyBlock integrates a novel four-level cognitive hierarchy assembly task alongside targeted Visual Question Answering (VQA) samples, collectively aimed at evaluating progressive spatial reasoning and fundamental physical comprehension, including object properties, spatial relationships, and holistic scene understanding. PhyBlock includes 2600 block tasks (400 assembly tasks, 2200 VQA tasks) and evaluates models across three key dimensions: partial completion, failure diagnosis, and planning robustness. We benchmark 23 state-of-the-art VLMs, highlighting their strengths and limitations in physically grounded, multi-step planning. Our empirical findings indicate that the performance of VLMs exhibits pronounced limitations in high-level planning and reasoning capabilities, leading to a notable decline in performance for the growing complexity of the tasks. Error analysis reveals persistent difficulties in spatial orientation and dependency reasoning. We position PhyBlock as a unified testbed to advance embodied reasoning, bridging vision-language understanding and real-world physical problem-solving.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/PhyBlock/PhyBlock_Benchmark
Code URL: https://github.com/PhyBlock/PhyBlock
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 1031
Loading