Higher-order Component Attribution via Kolmogorov–Arnold Networks

Published: 30 Sept 2025, Last Modified: 02 Dec 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Other, Vision transformers
Other Keywords: component modeling, component attribution
TL;DR: KANs as component models expose higher-order component interactions, improving counterfactuals.
Abstract: Component attribution quantifies how model components, from individual neurons to transformer blocks, contribute to a prediction. Despite their successes, most methods assume additive linear effects between components and overlook interactions that shape how predictions arise from internal computations. In this work, we formalize nonlinear component modeling and introduce a Kolmogorov–Arnold Network (KAN)-based framework for component attribution. We fit KAN surrogates on perturbation-response data to represent effects nonlinearly, then use them to extract local component interaction coefficients in two complementary ways: by automatic differentiation of the trained KAN and by recovering a symbolic surrogate whose closed-form mixed partial derivatives yield symbolic interaction scores. This provides a way to relate a classifier's output back to interacting internal building blocks instead of isolated components. The resulting expressions are intended for future integration with formal verification methods to support richer counterfactual analyses. Preliminary results on standard image classification models demonstrate that our approach improves the accuracy of counterfactual predictions and enables extraction of higher-order component interactions compared to linear attribution.
Submission Number: 313
Loading