Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration

ICLR 2025 Conference Submission2241 Authors

20 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, proof, guarantees, interpretability, numerical integration
TL;DR: We provide mathematical and anecdotal evidence that an MLP layer in a neural network implements numerical integration.
Abstract: The goal of mechanistic interpretability is discovering a simple, low-rank algorithm implemented by models. While we can compress activations into features, compressing nonlinear feature-maps---like MLP layers---is an open problem. In this work, we present the first case study in rigorously compressing nonlinear feature-maps. We work in the classic setting of the modular addition models (Nanda et al., 2023), and target a non-vacuous bound on the behavior of the ReLU MLP in time linear in the parameter-count of the circuit. To study the ReLU MLP analytically, we use the infinite-width lens, which turns post-activation matrix multiplications into approximate integrals. We discover a novel interpretation of the MLP layer in one-layer transformers implementing the “pizza” algorithm (Zhong et al., 2023): the MLP can be understood as evaluating a quadrature scheme, where each neuron computes the area of a rectangle under the curve of a trigonometric integral identity. Our code is available at [https://tinyurl.com/mod-add-integration](https://tinyurl.com/mod-add-integration).
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2241
Loading