Keywords: Chain of Thought/Reasoning models, Circuit analysis, Understanding high-level properties of models
TL;DR: We show that transformers cannot learn the invert permutations, which we argue is a key building block towards ensuring robustness in reasoning tasks.
Abstract: We study the problem of inverse permutation learning in decoder-only transformers. Given a permutation and a string to which that permutation has been applied, the model is tasked with producing the original (``canonical'') string. We argue that this task models a natural robustness property across a variety of reasoning tasks, including long-context retrieval, multiple choice QA and in-context learning.
Our primary contribution is an impossibility result: under weak assumptions, we show that an arbitrary depth, decoder-only transformer cannot learn this task. This result concerns the expressive capacity of decoder-only transformer models and is agnostic to training dynamics or sample complexity.
We give a pair of alternative constructions under which inverse permutation learning is feasible. The first of these highlights the fundamental role of the causal attention mask, and suggests a gap between the expressivity of encoder-decoder transformers and the more popular decoder-only architecture. The latter result is more surprising: we show that simply duplicating the input yields a construction under which inverse permutation learning is possible. We conjecture that this result may suggest an alternative mechanism by which chain-of-thought prompting or, more generally, intermediate ``thinking'' tokens can enable reasoning in large language models.
Submission Number: 125
Loading