Feature Attribution for Deep Learning Models through Total Variance Decomposition

Published: 27 Mar 2025, Last Modified: 01 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: feature attribution, explainable AI, counterfactual explanation, generative diffusion model
Abstract: This paper introduces a new approach to feature attribution for deep learning models, quantifying the importance of specific features in model decisions. By decomposing the total variance of model decisions into explained and unexplained fractions, conditioned on the target feature, we define the feature attribution score as the proportion of explained variance. This method offers a solid statistical foundation and normalized quantitative results. When ample data is available, we compute the score directly from test data. For scarce data, we use constrained sampling with generative diffusion models to represent the conditional distribution at a given feature value. We demonstrate the method’s effectiveness on both a synthetic image dataset with known ground truth and OASIS-3 brain MRIs.
Primary Subject Area: Interpretability and Explainable AI
Secondary Subject Area: Interpretability and Explainable AI
Paper Type: Methodological Development
Registration Requirement: Yes
Submission Number: 239
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview