Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: To address the unreliability of off-manifold traversals in path-based methods, we introduce MA-GIG, which leverages the latent geometry of generative models to produce strictly manifold-aligned and perceptually faithful explanations.
Abstract: Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose **Manifold-Aligned Guided Integrated Gradients** (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.
Lay Summary: Artificial intelligence systems often make decisions from images, but their users need to know which parts of an image led to a decision. Many explanation methods trace a path from a blank image to the real image, but this path can pass through unrealistic images and create noisy or misleading explanations. This study developed **Manifold-Aligned Guided Integrated Gradients** (MA-GIG), a method that uses a pre-trained image generator to keep the explanation path closer to realistic images and reduce noisy signals. In tests on ImageNet, Oxford-IIIT Pet, and Oxford 102 Flower with several image recognition models, the method produced cleaner and more faithful explanation maps than most existing path-based methods, often focusing better on the object or region related to the model's prediction. These explanations can help researchers and users check whether an artificial intelligence model is making decisions for the right reasons, although the method should still be validated carefully before use in high-stakes settings because it depends on the quality of the image generator.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/leekwoon/ma-gig/
Primary Area: Applications->Computer Vision
Keywords: Input Attribution, Integrated Gradient, Data Manifold, Variational Autoencoder
Originally Submitted PDF: pdf
Submission Number: 19831
Loading