An Interpretable Deep Classifier for Counterfactual Generation

Published: 03 Sept 2022, Last Modified: 08 Oct 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Counterfactual explanation has been the core of \textit{interpretable machine learning}, which requires a trained model to be able to not only infer but also justify its inference. This problem is crucial in many fields, such as fintech and the healthcare industry, where accurate decisions and their justifications are equally important. Many studies have leveraged the power of \textit{deep generative models} for counterfactual generation. However, most focus on vision data and leave the latent space unsupervised. In this paper, we propose a new and general framework that uses a supervised extension to the {Variational Auto-Encoder} (VAE) with {Normalizing Flow} (NF) for simultaneous classification and counterfactual generation. We show experiments on two tabular financial data-sets, Lending Club (LCD) and Give Me Some Credit (GMC), which show that the model can achieve a state-of-art level prediction accuracy while also producing meaningful counterfactual examples to interpret and justify the classifier's decision.
Loading