Scaffolding the Strategist: Architecture-Dependent Reasoning Interventions in Hotelling Spatial Markets

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: large language models, strategic reasoning, Hotelling spatial competition, chain-of-thought, LLM scaffolding, game theory, reasoning evaluation
TL;DR: We test how four reasoning interventions affect a standard and a reasoning-optimized LLM, finding that the optimal scaffolding strategy is dependent on the existing reasoning capabilities of the model.
Abstract: We investigate whether structured reasoning interventions improve the strategic economic reasoning of large language models, and whether their effects depend on model architecture. Using Hotelling's linear city model as a diagnostic vehicle, we evaluate GPT-4.1-mini (a standard instruction-following model) and GPT-5-mini (a reasoning-optimized model) under five conditions - an unscaffolded baseline and four reasoning interventions - across eight questions spanning deductive and abductive reasoning, three prompt framings, and three repetitions per condition, yielding 720 individually judged responses. We find a statistically significant crossover interaction between scaffolding type and model architecture ($t(7) = 4.79$, $p = 0.002$, $d = 1.69$): commitment scaffolding improves the standard model ($+0.21$) while degrading the reasoning model ($-0.63$), and principled separation shows the opposite pattern ($-0.40$ vs.\ $+0.31$). Both crossovers are individually significant (commitment: $p = 0.040$; separation: $p = 0.002$) and hold across all eight questions with 7/8 directional consistency. Adversarial stress-testing harms both models, with $2.6\times$ greater degradation for the reasoning model ($-1.47$ vs.\ $-0.57$; $p = 0.038$), and the damage correlates negatively with baseline difficulty ($R^2 = 0.36$, $p = 0.014$). We further document a persistent declarative-procedural gap in which both models identify correct strategies at rates far exceeding their ability to execute them; separation fully closes this gap for the reasoning model while no intervention helps the standard model.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 101
Loading