ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model

Srishti Gautam; Ahcene Boubekki; Stine Hansen; Suaiba Amina Salahuddin; Robert Jenssen; Marina MC Höhne; Michael Kampffmeyer

ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model

Srishti Gautam, Ahcene Boubekki, Stine Hansen, Suaiba Amina Salahuddin, Robert Jenssen, Marina MC Höhne, Michael Kampffmeyer

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: Interpretability, Explainable AI, Self-explaining Models, Deep Neural Networks

Abstract: The need for interpretable models has fostered the development of self-explainable classifiers. Prior approaches are either based on multi-stage optimization schemes, impacting the predictive performance of the model, or produce explanations that are not transparent, trustworthy or do not capture the diversity of the data. To address these shortcomings, we propose ProtoVAE, a variational autoencoder-based framework that learns class-specific prototypes in an end-to-end manner and enforces trustworthiness and diversity by regularizing the representation space and introducing an orthonormality constraint. Finally, the model is designed to be transparent by directly incorporating the prototypes into the decision process. Extensive comparisons with previous self-explainable approaches demonstrate the superiority of ProtoVAE, highlighting its ability to generate trustworthy and diverse explanations, while not degrading predictive performance.

TL;DR: We present a new self-explainable deep learning model that is trustworthy, transparent, and captures the diversity of the data.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/protovae-a-trustworthy-self-explainable/code)

18 Replies

Loading