Ehrenfeucht-Haussler Rank and Chain of Thought

Pablo Barcelo; Alexander Kozachinskiy; Tomasz Steifer

Ehrenfeucht-Haussler Rank and Chain of Thought

Pablo Barcelo, Alexander Kozachinskiy, Tomasz Steifer

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show that the minimal number of chain-of-thought iterations (for 1-layer hardmax transformers), needed to compute a function, is equal to the rank of that function.

Abstract: The notion of _rank_ of a Boolean function has been a cornerstone in PAC learning, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function $f$ corresponds to the minimum number of _Chain of Thought_ (CoT) steps required by a single-layer Transformer with hard attention to compute $f$. Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that $\ell$-fold function composition necessitates exactly $\ell$ CoT steps. Furthermore, we analyze the problem of identifying the position of the $k$-th occurrence of 1 in a Boolean sequence, proving that it requires $k$ CoT steps.

Lay Summary: The ability of Transformers to perform function composition has garnered increasing attention in recent years, as understanding this capability sheds light on the computational resources they require to infer implicit knowledge from a given set of facts. Peng et al. demonstrated that single-layer, soft-attention Transformers without Chain-of-Thought (CoT) reasoning are fundamentally incapable of function composition. However, when CoT is introduced, they can achieve iterated composition—albeit at the cost of requiring a growing number of steps, which depends on both vector dimensionality and feature precision. Our work precisely quantifies the number of steps needed for t-th iterated composition and establishes that, under the idealized assumption of hard-attention, the number of required CoT steps is exactly t. This finding underscores a key insight: while CoT enables function composition, it does so incrementally—one step at a time.

Primary Area: Theory->Learning Theory

Keywords: Transformers, chain of thought, decision tree rank, composition of functions

Submission Number: 2800

Loading