Analyzing the Power of Chain of Thought through Memorization Capabilities

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Theory, Chain of Thought, Transformer, Memorization
TL;DR: Use the memorization theorem to analyze the power of chain of thought.
Abstract: It has been shown that the chain of thought (CoT) can enhance the power of LLMs to simulate a Turing machine or an algorithm, and in particular its mathematical reasoning ability. The memorization capability of LLMs is an important aspect of their expressive ability, which offers valuable insight into designing models with enhanced generalization potential. Currently, the optimal memorization capacities of transformers have been established for both the general dataset and the dataset that satisfies a specific separability condition. However, the question of whether the CoT can improve the memorization capability of LLMs remains unexamined. To fill this gap, we establish the memorization capability for fixed-precision autoregressive transformers with or without CoT. Precisely, we first give the necessary and sufficient conditions for transformers to memorize a finite language and then provide the upper and lower bounds for the number of parameters of the memorization transformers. Our result indicates that the classes of languages that can be memorized by transformers with or without CoT do not contain each other, and the same number of parameters is needed for transformers with or without CoT to memorize, implying that CoT does not enhance a transformer’s memorization power significantly. We further show that CoT can not help transformers to memory certain infinite languages.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 11950
Loading