"I know that I don’t know... and I explain why'' Interpretable abstention via counterfactual explanations
Keywords: explainable AI, selective classification, interpretability
TL;DR: Interpretable-by-design method for selective classification via on distance-based counterfactual explanations
Abstract: Ensuring reliability in human-AI collaboration is crucial for fostering appropriate trust in hybrid decision-making systems, which hinges on performance and transparency but also on understanding the limits of ML methods. Selective classification addresses this need by allowing classifiers to reject uncertain instances and focusing on more confident predictions. However, very few works try to provide interpretable abstention policies for selective classification. In this work, we introduce a novel interpretable-by-design method for selective classification. that leverages the distance between data points and their set of counterfactuals as a measure of uncertainty. By using this distance as a basis for rejection, our method formulates an effective abstention policy while providing contrastive and model-agnostic explanations. Experimental results indicate that our method effectively implements a rejection policy that is explainable by design without affecting performance.
Primary Area: interpretability and explainable AI
Submission Number: 17135
Loading