Keywords: Graph-based Machine Learning, Explainable AI
Abstract: Post-hoc explanation methods for Graph Neural Networks (GNNs) are increasingly used to reveal which substructures influence a model’s prediction. However, recent studies show that such explanations are often brittle—small changes to the input graph can lead to drastically different explanations. This instability challenges their reliability in critical downstream tasks such as auditing, debugging, or human-in-the-loop decision making.
In this work, we introduce GrA, a risk-aware explanation trimming method that enhances the robustness of GNN explanations via a post-hoc, model-agnostic process. GrA identifies unstable edges using gradient-based sensitivity analysis and quantifies their volatility via Conditional Value-at-Risk (CVaR), a tail-aware risk measure. By removing high-risk edges, GrA produces a robust surrogate graph that retains explanatory fidelity while significantly reducing sensitivity to structural perturbations.
GrA requires no modification to the underlying GNN or explanation model and can be seamlessly applied to any gradient-accessible explainer. Across both synthetic and real-world graph classification benchmarks, and under various adversarial perturbation settings, GrA consistently improves explanation stability without compromising fidelity or predictive accuracy.
Primary Area: interpretability and explainable AI
Submission Number: 24262
Loading