Keywords: Deep Learning, Explainable Artificial Intelligence, Computer Vision
TL;DR: Some concepts are absent from the input yet still influence the output, and current XAI methods cannot account for such encoded absences.
Abstract: Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure most often includes relationships where the presence of an input pattern or latent feature is associated with a strong activation of a neuron. For example, attribution methods identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron — both implicitly assuming that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of *encoded absences*, where the absence of a concept increases activation, or vice versa, the presence of a concept inhibits activation. In this work, we show that such inhibitory relationships are common and that standard XAI methods fail to reveal them. To address this, we propose two extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show that standard XAI methods fail to explain encoded absences, illustrate how they can be revealed, how ImageNet models exploit them, and that debiasing can be improved when considering them.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 17180
Loading