Abstract: Even when a black-box model makes accurate predictions (e.g., whether it will rain tomorrow), it is difficult to extract principles from the model that improve human understanding (e.g., what set of atmospheric conditions best predict rainfall). Model explanations via explainability methods (e.g., LIME, Shapley values) can help by highlighting interpretable aspects of the model, such the data features to which the model is most sensitive. However, these methods can be unstable and inconsistent, which often ends up providing unreliable insights. Moreover, under the existence of many near-optimal models, there is no guarantee that explanations for a single model will agree with explanations from the true model that generated the data. In this work, instead of explaining a single best-fitting model, we develop principled methods to construct an uncertainty set for the ``true explanation'': the explanation from the (unknown) true model that generated the data. We show finite-sample guarantees that the uncertainty set we return includes the explanation for the true model with high probability. We show through synthetic experiments that our uncertainty sets have high fidelity to the explanations of the true model. We then report our findings on real-world data.
4 Replies
Loading