Keywords: SCOUTER, XAI, Explainable, AI, Interpretable, Reproducibility, Attention, Self-attention, Computer Vision
TL;DR: A reproducibility study attempting to replicate the experiments done in the SCOUTER paper.
Abstract: Reproducibility Summary Scope of Reproducibility We aim to replicate the main findings of the paper SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition by Li et al. in order to verify the main claims they make: 1) The explanations generated by SCOUTER outperform those by other explanation methods in several explanation evaluation metrics. 2) SCOUTER achieves similar classification accuracy as a fully connected model. 3) SCOUTER achieves higher confusion matrix metrics than a fully connected model on a binary classification problem. Methodology The authors provided code for training the models. We implemented the explanation evaluation metrics and confusion matrix metrics ourselves. We used the same hyperparameters as the original work, in case the hyperparameter was reported. We trained all models from scratch on various datasets and evaluated the explanations generated by these models with all reported metrics. We compared the accuracy scores between different models on several datasets. Finally, we calculated an assortment of confusion matrix metrics on models trained on a binary dataset. Results We were only able to reproduce 22.2% of the explanation evaluation metrics and could thus not find conclusive support for claim 1. We could only verify claim 2 for one of the datasets and in total could reproduce 55.5% of the original scores. We could reproduce all scores regarding claim 3, but the claim is still not justified, as the scores between the fully connected and SCOUTER models lie very close to one another. What was easy The paper was well written, so understanding the SCOUTER architecture was straightforward. The code for training a model was available and together with the examples the authors provide, this was achievable with relative ease. A checkpoint system is implemented, so training a model can be split into multiple runs. All used datasets are available and straightforward to obtain. What was difficult The original code did not contain any documentation, which made it difficult to navigate. No code for calculating the metrics was provided and this had to be implemented from scratch. During the training of the models, memory allocation issues occurred. Training and evaluating on a large dataset took a considerable amount of time. Communication with original authors We sent the authors an e-mail to request either the missing code or more details on how the metrics were implemented, but unfortunately we did not receive a reply.
Paper Url: https://openaccess.thecvf.com/content/ICCV2021/html/Li_SCOUTER_Slot_Attention-Based_Classifier_for_Explainable_Image_Recognition_ICCV_2021_paper.html
Paper Venue: ICCV 2021