Probing Policy-Level Memorization in Reasoning LLMs via Atomic Chess

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, memorization
TL;DR: Atomic chess exposes that reasoning LLMs can state and locally apply altered rules, but still often fail to integrate them into globally correct action selection when those rules conflict with standard priors.
Abstract: LLMs play chess at a notable level, but it is unclear whether this reflects memorization or rule-conditioned reasoning. We test this with atomic chess, a variant in which captures explode adjacent pieces, preserving notation but altering legal-move and terminal structure. We construct a paired benchmark of 200 variant-divergent positions from Lichess: 100 standard-source and 100 atomic-source positions, each selected so that the standard-best move is a severe blunder under atomic rules. On these positions, Claude Opus 4.6 and GPT-5.4 incur $2.1$--$4.6\times$ higher mean Win% loss under atomic than under standard rules on identical positions. Higher reasoning effort attenuates but does not close the gap; removing the rule statement collapses performance; and an unfiltered comparison shows the signal is hidden without prior-conflict filtering. Qualitative trace analysis reveals a recurring *unpropagated refutation* pattern: the model derives the correct atomic-rule consequence locally, then selects the move it just refuted. We argue this constitutes evidence of *policy-level memorization*, a behavioral signature complementary to extraction-based probes, and discuss implications for trust in instruction-following LLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 67
Loading