Interpreting Arithmetic Reasoning in Large Language Models using Game-Theoretic Interactions

Interpreting Arithmetic Reasoning in Large Language Models using Game-Theoretic Interactions

ACL ARR 2025 February Submission238 Authors

05 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In recent years, large language models (LLMs) have made significant advancements in arithmetic reasoning. However, the internal mechanism of how LLMs solve arithmetic problems remains unclear. In this paper, we propose explaining arithmetic reasoning in LLMs using game-theoretic interactions. Specifically, we disentangle the output score of the LLM into numerous interactions between the input words. We quantify different types of interactions encoded by LLMs during forward propagation to explore the internal mechanism of LLMs for solving arithmetic problems. We find that (1) the internal mechanism of LLMs for solving simple one-operator arithmetic problems is their capability to encode operand-operator interaction patterns and high-order interaction patterns from input samples. Additionally, we find that LLMs with poor arithmetic capabilities focus more on context-free interactions. (2) The internal mechanism of LLMs for solving relatively complex two-operator arithmetic problems is their capability to encode operator interaction patterns from input samples. (3) An LLM gradually forgets its capability to solve simple one-operator arithmetic problems as it learns to solve relatively complex two-operator arithmetic problems.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: interpretability; math QA; feature attribution

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 238

Loading