Abstract: In the field of explainable Artificial Intelligence (xAI), attribution methods provide off-the-shelf tools to explain the decision of a machine-learning model. However, existing methods usually do not correctly reflect the actual model behaviour, and they are difficult to compare, due to a lack of ground truth and/or consensus on what a correct explanation should be. Meanwhile, existing work leveraging formal verification to attribution methods is promising but suffer from scalability issues. In this study, we propose a Formally Grounded Correctness (FGC) metric to measure the correctness of existing attribution methods using formal verification. FGC can be used to rank popular explanation methods according to their correctness, ensuring that explanations are correct with regards to the model decision. To allow the application of FGC on more challenging datasets, we propose to use sound, incomplete algorithms, that trade accuracy for scalability, with existing work on input segmentation. We extensively benchmark FGC on MNIST, GTSRB and a scaled down version of ImageNet with several state-of-the-art provers and show its practical usefulness. FGC surprisingly shows that the baseline saliency heuristic is a strong baseline with regards to correctness, sheding light on some caveats of other popular explanation techniques.
External IDs:doi:10.3233/faia250897
Loading