$\texttt{DecompSR}$: A Dataset for Decomposed Analyses of Compositional Multihop Spatial Reasoning

$\texttt{DecompSR}$: A Dataset for Decomposed Analyses of Compositional Multihop Spatial Reasoning

TMLR Paper6833 Authors

06 Jan 2026 (modified: 17 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce $\texttt{DecompSR}$, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of $\texttt{DecompSR}$ allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). $\texttt{DecompSR}$ has been built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. $\texttt{DecompSR}$ is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic variation. $\texttt{DecompSR}$ provides a provably correct and rigorous benchmarking dataset with a novel ability to independently vary the degrees of several key aspects of compositionality, allowing for robust and fine-grained probing of the compositional reasoning abilities of LLMs.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Amit_Sharma3

Submission Number: 6833

Loading