## Explaining Low Dimensional Representation, a reproduction

Jan 31, 2021 (edited Apr 01, 2021)RC2020Readers: Everyone
• Keywords: Explainability, Interpretable Machine Learning, Counterfactual Explanations, Representation Learning
• Abstract: This report covers our reproduction of the paper 'Explaining Low dimensional Representation' \cite{plumb2020explaining} by Plumb et al. In this paper a method (Transitive Global Translations, TGT) is proposed for explaining different clusters in low dimensional representations of high dimensional data. They show their method outperforms the difference between the means (DBM) method, is consistent in explaining differences with few features, and matches real patterns in data. We verify these claims by reproducing their experiments and testing their method on new data. We also investigate the use of more complex transformations to explain differences between clusters. We reproduce the original experiments using their source code. We also replicate the findings by re-implementing the authors' method in PyTorch and evaluating on two of the dataset used in the paper and two new ones. Furthermore, we compare TGT with our own extension of TGT, which uses a larger class of transformations. We were able to reproduce their results using their code, yielding mostly similar results. TGT generally outperforms DBM, especially when explanations use few features. TGT is consistent in terms of the features to which it attributes cluster differences across different sparsity levels. TGT matches real patterns in data. When extending the types of functions used for explanations, performance did not improve significantly, suggesting translations make for adequate explanations. However, the scaling extension shows promising performance on the modified synthetic data to recover the original signal. The easiest part was running the existing code with the pre-trained model files. The original authors had set up their codebase in an organized manner with clear instructions. The first difficulty that we encounter was finding the right environment. The source code depends on deprecated functionality. The clustering method they used, had to be re-implemented for us to use it in our replication. Another difficulty was the selection of clusters. The authors did not prove a consistent method for selecting clusters in a latent space representation. When retraining the provided models, we get a latent space representation different from the original experiments. The clusters have to be manually selected. The metrics that they used to evaluate their explanations are also depending on the clustering. This means that there is some variability in the exact verification of reproducibility. We asked the original authors for clarification on how to choose the $\epsilon$ hyper-parameter. However, it became apparent that we had misread, and the procedure is indeed adequately reported in the paper.
• Paper Url: https://openreview.net/forum?id=MFj70_2-eY1&noteId=qf1uS0EEyLy&referrer=%5BML%20Reproducibility%20Challenge%202020%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2020)
4 Replies