Keywords: explainability, XAI, explainable AI, formal verification, sufficient explanations
TL;DR: Our approach constructs provably sufficient and (globally) cardinal-minimal explanations for neural additive models with improved runtime complexity.
Abstract: Despite significant progress in post-hoc explanation methods for neural
networks, many remain heuristic and lack provable guarantees. A key approach
for obtaining explanations with provable guarantees is by identifying a
cardinally-minimal subset of input features which by itself is provably
sufficient to determine the prediction. However, for standard neural networks,
this task is often computationally infeasible, as it demands a worst-case
exponential number of verification queries in the number of input features,
each of which is NP-hard.
In this work, we show that for Neural Additive Models (NAMs), a recent and
more interpretable neural network family, we can efficiently generate
explanations with such guarantees. We present a new model-specific algorithm
for NAMs that generates provably cardinally-minimal explanations using only a
logarithmic number of verification queries
in the number of input features, after a parallelized preprocessing step with
logarithmic runtime in the required precision is applied to each small
univariate NAM component.
Our algorithm not only makes the task of obtaining cardinally-minimal
explanations feasible, but even outperforms existing algorithms designed to
find the relaxed variant of subset-minimal explanations - which may be larger
and less informative but easier to compute - despite our algorithm solving a
much more difficult task.
Our experiments demonstrate that, compared to previous algorithms, our
approach provides provably smaller explanations than existing works and
substantially reduces the computation time. Moreover, we show that our
generated provable explanations offer benefits that are unattainable by
standard sampling-based techniques typically used to interpret NAMs.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 19723
Loading