X-Hacking: The Threat of Misguided AutoML

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Deliberate manipulation of SHAP explanations at scale using AutoML
Abstract: Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a set of ‘defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.
Lay Summary: Artificial Intelligence (AI) models are increasingly used to make critical decisions in fields like healthcare and justice. To build trust and ensure accountability in these powerful systems, we rely on "Explainable AI" to understand how they arrive at their conclusions. However, our research has uncovered a critical vulnerability we call "X-hacking." This is when an AI's explanation can be skewed or misrepresented to support a specific outcome, even if that outcome is misleading or doesn't reflect the AI's true reasoning. This can happen intentionally, but also inadvertently by those who are inexperienced or pressed for time. X-hacking is possible because many different AI models can produce equally good results, even if their underlying logic and the explanations they provide differ. We found that automated tools for building AI (called AutoML) make it surprisingly easy to find and leverage these differences to "X-hack" explanations on a large scale. Our findings show that using a smart search method can even speed up this manipulation by three times! We also discovered that some types of data are more prone to X-hacking, especially when the information within that data is very similar or redundant. To combat this, we propose ways to detect X-hacking. For example, we can check if a given explanation seems unusually far-fetched compared to other reasonable possibilities, and show this on a graph. Our work aims to alert everyone to this significant risk, encouraging the creation of AI systems that are truly transparent and trustworthy.
Link To Code: https://github.com/datasciapps/x-hacking
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: explainable AI, SHAP, AutoML, Model Multiplicity, Rashomon sets, Ethics, Reproducibility
Submission Number: 10698
Loading