TL;DR: A three-stage architecture is identified that supports abstract reasoning in LLMs via a set of emergent symbol-processing mechanisms
Abstract: Many recent studies have found evidence for emergent reasoning capabilities in large language models (LLMs), but debate persists concerning the robustness of these capabilities, and the extent to which they depend on structured reasoning mechanisms. To shed light on these issues, we study the internal mechanisms that support abstract reasoning in LLMs. We identify an emergent symbolic architecture that implements abstract reasoning via a series of three computations. In early layers, *symbol abstraction heads* convert input tokens to abstract variables based on the relations between those tokens. In intermediate layers, *symbolic induction heads* perform sequence induction over these abstract variables. Finally, in later layers, *retrieval heads* predict the next token by retrieving the value associated with the predicted abstract variable. These results point toward a resolution of the longstanding debate between symbolic and neural network approaches, suggesting that emergent reasoning in neural networks depends on the emergence of symbolic mechanisms.
Lay Summary: Large language models have shown remarkable abstract reasoning abilities. What internal mechanisms do these models use to perform reasoning? Some previous work has argued that abstract reasoning requires specialized 'symbol processing' machinery, similar to the design of traditional computing architectures, but large language models must develop (over the course of training) the circuits that they use to perform reasoning, starting from a relatively generic neural network architecture. In this work, we studied the internal mechanisms that language models use to perform reasoning. We found that these mechanisms implement a form of symbol processing, despite the lack of built-in symbolic machinery. The results shed light on the processes that support reasoning in language models, and illustrate how neural networks can develop surprisingly sophisticated circuits through learning.
Link To Code: https://github.com/yukang123/LLMSymbMech
Primary Area: Deep Learning->Large Language Models
Keywords: Mechanistic interpretability, reasoning, abstraction, large language models
Submission Number: 13386
Loading