An Analysis of Reasoning Length Scaling and Positional Effects in Vision Language Models for Spatial Reasoning
Keywords: Vision language models, Reasoning, Chain of thought, Relative size comparison, Relational filtering, Positional bias, Reasoning length scaling
Abstract: Vision language models often produce step by step reasoning traces even in zero shot settings, but it is unclear how this reasoning length scales with spatial problem complexity. We introduce the Largest Circle Puzzle, an easily scalable synthetic benchmark that requires connectivity based relational filtering and relative size comparison under increasing visual clutter. By varying the number of circles, we control problem complexity, and by controlling the answer location, we probe positional effects. Across several state of the art VLMs with explicit reasoning behavior, accuracy declines steadily as scenes become more crowded. On instances solved correctly, reasoning length typically grows with problem size and can follow an approximately linear trend, consistent with scan like strategies. However, reasoning length alone does not predict robustness: some models exhibit strong location dependent performance, with large accuracy gaps between corner placements even when successful reasoning length scaling remains stable.
Submission Number: 84
Loading