Keywords: Spatial Reasoning, Layout Reasoning, Scene Understanding, Structured Scene Representations, Benchmark, Large Language Models (LLMs)
TL;DR: FloorplanQA evaluates large language models’ spatial and geometric reasoning on structured indoor layouts, featuring questions on topological logic and design constraints, revealing gaps in models’ ability to reason about spatial environments.
Abstract: We introduce FloorplanQA, a diagnostic benchmark for evaluating spatial reasoning in large-language models (LLMs). FloorplanQA is grounded in structured representations of indoor scenes, such as (e.g., kitchens, living rooms, bedrooms, bathrooms, and others), encoded symbolically in JSON or XML layouts. The benchmark covers core spatial tasks, including distance measurement, visibility, path finding, and object placement within constrained spaces. Our results across a variety of frontier open-source and commercial LLMs reveal that while models may succeed in shallow queries, they often fail to respect physical constraints, preserve spatial coherence, though they remain mostly robust to small spatial perturbations. FloorplanQA uncovers a blind spot in today’s LLMs: inconsistent reasoning about indoor layouts. We hope this benchmark inspires new work on language models that can accurately infer and manipulate spatial and geometric properties in practical settings.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 18124
Loading