Keywords: training dynamics; large language models; chess; model pre-training; vocabulary design; domain specific languages; symbolic domains
Abstract: We investigate the emergent behaviors of rule comprehension, tactical execution, and strategic competence in transformer-based models trained on algebraic chess notation. To support structured reasoning, we introduce a disambiguation-aware tokenization scheme that explicitly encodes promotions, castling, checks, and mates, enabling fine-grained modeling of chess rules and dynamics.
Our analysis reveals phase transitions in capabilities: shallow models fewer than 15 layers exhibit high illegality rates, while deeper models 20 layers or more increasingly demonstrate reliable tactical and positional behaviors. Training dynamics show while rule comprehension emerges early, higher-order abilities follow a hierarchical developmental path that mirrors curriculum learning. These trends remain consistent across decoding strategies and training distributions.
Our findings suggest that transformer models can acquire human-aligned planning abilities in symbolic domains. Chess provides a tractable benchmark for evaluating the staged emergence of hierarchical competence in language models. Our methodology, including vocabulary design, architectural scaling, and behavioral evaluation, has the potential to generalize to other structured domains such as programming, formal logic, and mathematical proof systems.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18763
Loading