Keywords: XAI, perturbation, topk
Abstract: The adoption of machine learning for socially relevant tasks requires effective explainable artificial intelligence (XAI) methods to better understand the behavior of machine learning models.
Attribution methods are a popular XAI approach in which input-output relationships are characterized by heat maps that reflect the relative importance of input features for a particular prediction.
The quality of such maps is often assessed by measuring faithfulness based on the area under the insertion curve. We propose the first method that directly optimizes this metric to generate attribution heat maps. We establish the connection between insertion curves and top-$k$ feature selection, which leads to a loss function measuring the quality of attributions. Randomization of the loss allows us to efficiently approximate its gradient. We combine the loss function with the neural explanation mask framework to create a new approach for providing accurate attributions efficiently.
Experiments demonstrate superior faithfulness along with robust attributions and low inference time, suggesting a new path to generate useful explanations. Code is available at: https://anonymous.4open.science/r/Ra-nem_ICLR-2AD4
Primary Area: interpretability and explainable AI
Submission Number: 16935
Loading