"I know that I don’t know... and I explain why'' Interpretable abstention via counterfactual explanations

"I know that I don’t know... and I explain why'' Interpretable abstention via counterfactual explanations

ICLR 2026 Conference Submission17135 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: explainable AI, selective classification, interpretability

TL;DR: Interpretable-by-design method for selective classification via on distance-based counterfactual explanations

Abstract: Ensuring reliability in human-AI collaboration is crucial for fostering appropriate trust in hybrid decision-making systems, which hinges on performance and transparency but also on understanding the limits of ML methods. Selective classification addresses this need by allowing classifiers to reject uncertain instances and focusing on more confident predictions. However, very few works try to provide interpretable abstention policies for selective classification. In this work, we introduce a novel interpretable-by-design method for selective classification. that leverages the distance between data points and their set of counterfactuals as a measure of uncertainty. By using this distance as a basis for rejection, our method formulates an effective abstention policy while providing contrastive and model-agnostic explanations. Experimental results indicate that our method effectively implements a rejection policy that is explainable by design without affecting performance.

Primary Area: interpretability and explainable AI

Submission Number: 17135

Loading