Keywords: reproducibility, black box, explanations, interpretability, transparent models, convolutional neural networks, variational autoencoders, explainable artificial intelligence, post-hoc explanation
Abstract: Reproducibility Summary
Scope of Reproducibility
We verify the outcome of the methodology proposed in the article, which attempts to provide post-hoc causal explanations for black-box classifiers through causal reference. This is achieved by replicating the code step by step, according
to the descriptions in the paper. All the claims in the paper have been examined, and we provide additional metric to
evaluate the portability, expressive power, algorithmic complexity and the data fidelity of their framework. We have
further extended their analyses to consider all benchmark datasets used, confirming results.
Methodology
We use the same architecture and (hyper)parameters for replication. However, the code has a different structure and we
provide a more efficient implementation for the measure of information flow. In addition, Algorithm 1 in the original
paper is not implemented in their repository, so we have also implemented Algorithm 1 ourselves and further extend
their framework to another domain (text data), although unsuccessfully. Furthermore, we make a detailed table in the
paper to show the time used to produce the results for different experiments reproduced. All models were trained on
Nvidia GeForce GTX 1080 GPUs provided by Surfsara’s LISA cluster computing service at university of Amsterdam.
.
Results
We reproduced the framework in the original paper and verified the main claims made by the authors in the original
paper. However, the GCE model in extension study did not manage to separate causal factors and non-causal factors for
a text classifier due to the complexity of fine-tuning the model.
What was easy
The original paper comes with extensive appendices, many of which contain crucial details for implementation and
understanding of the intended function. The authors provide code for most of the experiments presented in the paper.
Although at the beginning their code repository was not functional, we use it as a reference to re-implement our code.
The author also updated their code two weeks after we start our own implementation, which made it easy for us to
verify the correctness of our re-implementation.
What was difficult
The codebase the authors provided was initially unusable, with missing or renamed imports, hardcoded filepaths and an
all-around convoluted structure. Additionally, the description of Algorithm 1 is quite vague and no implementation of it
was given. Beyond this, computational expense was a serious issue, given the need for inefficient training steps, and
re-iterating training several times for hyperparameter search.
Communication with original authors
This reproducibility study is part of a course on fairness, accountability, confidentiality and transparency in AI. Since it
is a course project where we interacted with other group in the forum, and another group also working with this paper
has reached out to the authors about problems with the initial repository, we did not find necessary to do it again.
Paper Url: https://openreview.net/forum?id=tdG6Fa3Y6hq&referrer=%5BML%20Reproducibility%20Challenge%202020%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2020)
Supplementary Material: zip
3 Replies
Loading