Explaining Confident Black-Box Predictions

Explaining Confident Black-Box Predictions

TMLR Paper4486 Authors

14 Mar 2025 (modified: 09 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Interpretability is crucial for leveraging predictive machine learning for decision-making, but the strongest performing models are often black-boxes in that they are difficult to understand. For binary classification models, a growing body of literature seeks to find \textit{model-agnostic} explanations by treating a model as a list of 0/1 predictions and identifying patterns for when a model predicts $1$ over $0$ (or vice versa). While such explanations are useful for understanding when a model predicts 1 over 0, they do not consider the confidence (i.e., the probability) behind predictions, a critical piece of information provided by most classification models. Since the 0/1 predictions of a model depend on the choice of a subjective threshold for discretizing predicted probabilities, as one changes the threshold, the resulting explanations may change despite the underlying model staying the same. In contrast, this work proposes model-agnostic explanations that treat a black-box model as a \textit{ranking} across a dataset from lowest predicted probability of $1$ to highest, rather than a list of 0/1 predictions. Under this ranking, a useful explanation should capture broadly when a model \textit{confidently} predicts $1$ (i.e., highly ranked data points). Since highly confident predictions are often correlated with predictions that are more accurate and actionable, understanding when a model predicts confidently is often quite valuable to a practitioner. This work builds explanations based on rule lists (i.e., a collection of if-then rules) as well as a novel special case called checklists. A strong rule list or checklist is satisfied by a large number of data points that are ranked highly by the model. This criteria is measured by the traditional metric of support (i.e., the number of data points an explanation applies to), the \textit{average} ranking of those data points, which we call the Average Black-Box Ranking (ABBR), as well as the sparsity of the explanation (e.g., number of rules in the rule list, among others). Given these metrics, this work develops a local-search based optimization methodology for finding explanations based on rule lists and checklists that maximize ABBR for a user-specified support and sparsity constraint. The methodology leverages a local search approach where an initial rule list is chosen greedily from a pool of candidate rules, then slowly perturbed by swapping rules from the rule list with those in the candidate pool. This approach is evaluated on 6 real world datasets in application areas ranging from healthcare to criminal justice and finance. Empirical results suggest that this methodology finds rule lists of length at most 5 with ABBR within 7.4\% of the optimal ABBR of any explanation, while checklists provide greater interpretability for a small cost in performance.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Addressing reviewer's comments regarding: (1) Empirical Experiment to show the value of ABBR as a metric (Appendix A) (2) Comparison of RuleListMiner algorithm against a suitable baseline (Appendix B).

Assigned Action Editor: ~Shahin_Jabbari1

Submission Number: 4486

Loading