Emergent Chess Skill Acquisition in Large Language Models

Emergent Chess Skill Acquisition in Large Language Models

ICLR 2026 Conference Submission18763 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: training dynamics; large language models; chess; model pre-training; vocabulary design; domain specific languages; symbolic domains

Abstract: We investigate the emergent behaviors of rule comprehension, tactical execution, and strategic competence in transformer-based models trained on algebraic chess notation. To support structured reasoning, we introduce a disambiguation-aware tokenization scheme that explicitly encodes promotions, castling, checks, and mates, enabling fine-grained modeling of chess rules and dynamics. Our analysis reveals phase transitions in capabilities: shallow models fewer than 15 layers exhibit high illegality rates, while deeper models 20 layers or more increasingly demonstrate reliable tactical and positional behaviors. Training dynamics show while rule comprehension emerges early, higher-order abilities follow a hierarchical developmental path that mirrors curriculum learning. These trends remain consistent across decoding strategies and training distributions. Our findings suggest that transformer models can acquire human-aligned planning abilities in symbolic domains. Chess provides a tractable benchmark for evaluating the staged emergence of hierarchical competence in language models. Our methodology, including vocabulary design, architectural scaling, and behavioral evaluation, has the potential to generalize to other structured domains such as programming, formal logic, and mathematical proof systems.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 18763

Loading