Prototypical Self-Explainable Models Without Re-training

Srishti Gautam; Ahcene Boubekki; Marina MC Höhne; Michael Kampffmeyer

Prototypical Self-Explainable Models Without Re-training

Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, Michael Kampffmeyer

Published: 03 Jun 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs\footnote{The code is available at https://github.com/SrishtiGautam/KMEx}.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: * We have revised the manuscript to include a discussion about the relationship between our proposed method and existing work on prototype classifiers, particularly those used in few-shot learning as suggested in Section 2.2. * We have removed the broad language used, more specifically in abstract, and in the introduction of Section 3. * We acknowledge the AE’s point regarding the interpretability component of our method. We have added the suggested nuance to the main contribution. * We have made several revisions to improve the clarity and readability of the manuscript, as suggested by the reviewers.

Code: https://github.com/SrishtiGautam/KMEx

Assigned Action Editor: ~Erin_Grant1

Submission Number: 1975

Loading