Addition Circuit: How LLMs Add in Their Heads using State Vectors

Addition Circuit: How LLMs Add in Their Heads using State Vectors

ICLR 2026 Conference Submission25426 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic Interpretability, Large Language Models, Addition, Arithmetic, Algorithmic Reasoning, Circuits

TL;DR: We show that LLMs learn representations of integers in the addition tasks that generalize across prompt templates/number formats/languages and we reverse engineer the 2-argument addition circuit for muti-token integers in Llama 3.1 8B

Abstract: Large Language Models (LLMs) are often treated as black boxes, yet many of their behaviours suggest the presence of internal, algorithm-like structures. We present addition circuit as a concrete, mechanistic example of such a structure: a sparse set of attention heads that perform integer addition. Focusing on two popular open-source models (Llama 3.1 8B and Llama 3.1 70B), we make the following contributions. (i) We extend prior work on two-argument addition to the multi-argument setting, showing that both models employ fixed subsets of attention heads specialized in encoding summands at specific positions in addition prompts. (ii) We introduce state vectors that efficiently capture how models represent summands in their activation spaces. We find that each model learns a common representation of integers that generalizes across prompt formats and across six languages, whether numbers are expressed as Arabic digits or word numerals.

Primary Area: interpretability and explainable AI

Submission Number: 25426

Loading