Provably Explaining Neural Additive Models

ICLR 2026 Conference Submission19723 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: explainability, XAI, explainable AI, formal verification, sufficient explanations
TL;DR: Our approach constructs provably sufficient and (globally) cardinal-minimal explanations for neural additive models with improved runtime complexity.
Abstract: Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a *(globally) cardinal-minimal* subset of input features which by itself is *provably sufficient* to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case *exponential* number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can *efficiently* generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably (globally) cardinal-minimal explanations using only a *logarithmic* number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtaining (globally) cardinal minimal explanations feasible, but even outperforms existing algorithms designed to find *(locally) subset-minimal* explanations -- which may be larger and less informative but easier to compute -- despite our algorithm solving a much more difficult task. Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 19723
Loading