ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-linear Activation

Published: 24 Jun 2024, Last Modified: 31 Jul 2024ICML 2024 MI Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, proof, guarantees, interpretability, numerical integration
TL;DR: We provide mathematical and anecdotal evidence that an MLP layer in a neural network implements numerical integration.
Abstract: Extending the analysis from Nanda et al. (2023) and Zhong et al. (2023), we offer an end-to-end interpretation of the 1 layer MLP-only modular addition transformer model with symmetric embeds. We present a clear and mathematically rigorous description of the computation at each layer, in preparation for the proofs-based verification approach as set out in concurrent work under review. In doing so, we present a new interpretation of MLP layers: that they implement quadrature schemes to carry out numerical integration, providing anecdotal and mathematical evidence in support. This overturns the existing idea that neurons in neural networks are merely on-off switches that test for the presence of ``features'' -- instead multiple neurons can be combined in non-trivial ways to produce continuous quantities.
Submission Number: 26
Loading