[Re] Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Noah Van der Vleuten; Tadija Radusinović; Rick Akkerman; Meilina Reksoprodjo

[Re] Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Noah Van der Vleuten, Tadija Radusinović, Rick Akkerman, Meilina Reksoprodjo

Published: 11 Apr 2022, Last Modified: 23 Feb 2025RC2021Readers: Everyone

Keywords: stylegan2, generative modelling, counterfactual explanations, explainable ai, generative adversarial networks

TL;DR: Reproduction study on the Explaining in Style paper by Lang et al. (2021).

Abstract: StylEx is an approach for classifier-conditioned training of a StyleGAN2, intending to capture classifier-specific attributes in its disentangled StyleSpace. Attributes can be adjusted to generate counterfactual explanations of the classifier decisions. StylEx is domain and classifier-agnostic, while its explanations are claimed to be human-interpretable, distinct, coherent and sufficient to produce flipped classifier decisions. We verify these claims by reproducing a selection of the experiments in the paper. We verified a selection of the experimental results on the code available by the authors. However, a significant part of the training procedure, network architecture and hyperparameter configurations were missing. As such, we reimplemented the model and available TensorFlow code to PyTorch, to enable easier reproducibility on the proposed case studies. All experiments were run in approximately 20-50 GPU hours per dataset, depending on the batch size, gradient accumulation and GPU. We verified that the publicly available pretrained model has a 'sufficiency' measure within 1\% of the value reported in the paper. Additionally, we evaluate the Fréchet inception distance (FID) scores of images generated by the released model. We show that the FID score increases with the number of attributes used to generate a counterfactual explanation. Custom models were trained on three datasets, with a reduced image dimensionality ($64^2$). Additionally, a user study was conducted to evaluate the distinctiveness and coherence of the images. We report a significantly lower accuracy on the identification of the extracted attributes and 'sufficiency' scores on our model. It was easy to run the provided Jupyter Notebook, and verify the results of the pretrained models on the FFHQ dataset. Extending an existing StyleGAN2 implementation to fit this study was relatively easy. Reproducing the experiments on the same scale as the authors, as well as the development of the full training procedure, model architecture and hyperparameters, particularly due to underspecification in the original paper. Additionally, the conversion of code from Tensorflow to PyTorch. We corresponded with the first author of the paper through several emails. Through our mail contact, additional details were released on the network architecture, the training procedure and the hyperparameter configurations.

Paper Url: https://openaccess.thecvf.com/content/ICCV2021/papers/Lang_Explaining_in_Style_Training_a_GAN_To_Explain_a_Classifier_ICCV_2021_paper.pdf

Paper Venue: ICCV 2021

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/explaining-in-style-training-a-gan-to-explain/code)

4 Replies

Loading