Keywords: consistency, prediction consistency, model duplicity, inconsistent predictions, deep models, deep networks, explanations, saliency maps, gradient-based explanations, fairness, interpretability
Abstract: Recent work has shown that models trained to the same objective, and which achieve similar measures of accuracy on consistent test data, may nonetheless behave very differently on individual predictions. This inconsistency is undesirable in high-stakes contexts, such as medical diagnosis and finance. We show that this duplicitous behavior extends beyond predictions to feature attributions, which may likewise have negative implications for the intelligibility of a model, and one's ability to find recourse for subjects. We then introduce selective ensembles to mitigate such inconsistencies by applying hypothesis testing to the predictions of a set of models trained using randomly-selected starting conditions; importantly, selective ensembles can abstain in cases where a consistent outcome cannot be achieved up to a specified confidence level. We prove that that prediction disagreement between selective ensembles is bounded, and empirically demonstrate that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates. On several benchmark datasets, selective ensembles reach zero inconsistently predicted points, with abstention rates as low as 1.5%.
One-sentence Summary: Deep models give inconsistent predictions and explanations over small changes (e.g. random initialization). We can mitigate this by using selective ensemble models, which abstain from prediction if their constituent models do not agree sufficiently.
Supplementary Material: zip