Explaining Bayesian Neural Networks

TMLR Paper5143 Authors

18 Jun 2025 (modified: 25 Aug 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of explainable AI (XAI) was established to provide interpretations of DNNs' predictions. While different explanation techniques exist, a popular approach is given in the form of attribution maps, which illustrate, given a particular data point, the relevant patterns the model has used for making its prediction. Although Bayesian models such as Bayesian Neural Networks (BNNs) have a limited form of transparency built-in through their prior weight distribution, they lack explanations of their predictions for given instances. In this work, we bring together these two perspectives of transparency into a holistic explanation framework for explaining BNNs. Within the Bayesian framework, network weights follow a probability distribution; hence the standard point explanation extends to an explanation distribution. Viewing explanations probabilistically, we aggregate and analyze multiple local attributions drawn from an approximate posterior to surface explanation diversity. This diversity provides a diagnostic lens on how predictive rationales can vary across posterior samples. Quantitative and qualitative experiments on toy/benchmark data and real-world pathology illustrate that our framework enriches standard explanations with uncertainty information and supports practical inspection of explanation stability.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Quanshi_Zhang1
Submission Number: 5143
Loading