Reproducibility Study - SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Anonymous

Reproducibility Study - SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Anonymous

05 Feb 2022 (modified: 05 May 2023)ML Reproducibility Challenge 2021 Fall Blind SubmissionReaders: Everyone

Keywords: MLRC, SCOUTER, Explainable AI, Image Recognition

TL;DR: Reproducing Slot Attention-based Classifier for Explainable Image Recognition

Abstract: Scope of Reproducibility The experiments presented by Li et al. on SCOUTER were replicated to verify their claims on the properties of the model. For the task of explainable image recognition, the authors state their main claims: (1) SCOUTER produces state-of-the-art performance values on positive and negative explanations. (2) SCOUTER can be trained on different domains with similar accuracy. (3) The area size of the explanatory regions is adjustable by changing the λ hyperparameter introduced in the model. (4) The classification accuracy decreases with a larger number of classes. Methodology A codebase was provided with the paper with the implementation of the model architecture, training and visualization. This was used to replicate the experiments. The calculation of the explanation metrics was partly re-implemented and partly taken from the code base of their respective authors. The experiments supporting the claims were run with adjustments to the number of trials, classes and batch size due to hardware constraints. Results The replicated experiments could not reproduce most state-of-the-art explanations using SCOUTER with the experimental setup on the evaluation metrics. The experiment using SCOUTER+ with λ = 1 replicated precision within 3% of the reported value. The accuracy of the model on ImageNet was replicated within 1% of the reported value, however, the accuracy values were not comparably replicated for the CUB-200 dataset. The adjustable area size described by the claim were replicated using SCOUTER−, however, the specific values were not replicated. The experiments could also replicate the trend of decreasing accuracy in relation to an increasing number of classes in the dataset. What was easy The paper clearly describes the xSlot attention module. Although the code for the evaluation metrics was not provided, references to the papers introducing the metrics were given. The figures and visualizations used in the paper and in the appendix were intuitive and useful for understanding their code implementation. What was difficult The implementation of the evaluation metrics was more difficult than expected since these it was not given and its description does not provide sufficient detail to precisely reproduce their evaluation. Training the models also took longer than anticipated. Although this problem was solved by using remote hardware resources, the cost for training was high and therefore limited the number of experiments that could be performed. Communication with original authors Authors of the paper were contacted to resolve technical issues: failing to reproduce infidelity. There was no response.

Paper Url: https://arxiv.org/abs/2009.06138

Paper Venue: ICCV 2021

4 Replies

Loading