Keywords: interpretability, explainability
TL;DR: This paper proposes and studies sufficient, necessary, and unified explanations to understand model predictions.
Abstract: As complex machine learning models continue to be used in high-stakes decision settings, understanding their predictions is crucial. Post-hoc explanation methods aim to identify which features of an input $x$ are important to a model's prediction $f({\bf x})$. However, explanations often vary between methods and lack clarity, limiting the information we can draw from them. To address this, we formalize two precise concepts—*sufficiency* and *necessity*—to quantify how features contribute to a model's prediction. We demonstrate that, although intuitive and simple, these two types of explanations may fail to fully reveal which features a model deems important. To overcome this, we propose and study a unified notion of importance that spans the entire sufficiency-necessity axis. Our unified notion, we show, has strong ties to notions of importance based on conditional independence and Shapley values. Lastly, through various experiments, we quantify the sufficiency and necessity of popular post-hoc explanation methods. Furthermore, we show that generating explanations along the sufficiency-necessity axis can uncover important features that may otherwise be missed, providing new insights into feature importance.
Submission Number: 94
Loading