Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
Keywords: Mechanistic Interpretability, Math Reasoning, Fine-tuning
TL;DR: Constructive Circuit Amplification selectively updates sparse task-relevant circuits in LLMs, boosting math reasoning accuracy by up to 11.4% with minimal changes and little impact on other abilities.
Abstract: Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that performance improvements of fine-tuning often results from the strengthening of existing circuits. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise, task-targeted updates. Motivated by these findings, we propose a novel method called Constructive Circuit Amplification which identifies pivotal tokens from model reasoning traces as well as model components responsible for the desired task, and updates only those components. Applied to mathematical reasoning, it improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components, with minimal impact on other abilities as measured by MMLU, TriviaQA, and TruthfulQA. These results demonstrate that targeted capabilities can be reliably enhanced by selectively updating a sparse set of model components.
Submission Number: 234
Loading