The Price of Freedom: An Adversarial Attack on Interpretability Evaluation

Kristoffer Knutsen Wickstrøm; Marina MC Höhne; Anna Hedström

The Price of Freedom: An Adversarial Attack on Interpretability Evaluation

Kristoffer Knutsen Wickstrøm, Marina MC Höhne, Anna Hedström

Published: 10 Oct 2024, Last Modified: 03 Dec 2024IAI Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretability, Explainability, Adversarial Manipulation, Evaluation, Reproducibility, Reliability, Faithfulness

TL;DR: Due to the lack of ground truth explanations, XAI evaluation methods can be adversarially manipulated which reduces trustworthiness.

Abstract: The absence of ground truth explanation labels poses a key challenge for quantitative evaluation in interpretable AI (IAI), particularly when evaluation methods involve numerous user-specified hyperparameters. Without ground truth, optimising hyperparameter selection is difficult, often leading researchers to make choices based on similar studies, which offers considerable flexibility. We show how this flexibility can be exploited to manipulate evaluation outcomes by framing it as an adversarial attack where minor hyperparameter adjustments lead to significant changes in results. Our experiments demonstrate substantial variations in evaluation outcomes across multiple datasets, explanation methods, and models. To counteract this, we propose a ranking-based mitigation strategy that enhances robustness against such manipulations. This work underscores the challenges of reliable evaluation in IAI. Code is available at \url{https://github.com/Wickstrom/quantitative-IAI-manipulation}.

Track: Published paper track

Submitted Paper: No

Published Paper: Yes

Published Venue: eXCV Workshop at ECCV 2024 (Proceedings Track)

Submission Number: 78

Loading