Keywords: knowledge distillation, review mechanism
Abstract: Scope of Reproducibility
This effort aims to reproduce the results of experiments and analyze the robustness of the review framework for
knowledge distillation introduced by Chen et al. We consistently verify the improvement in test accuracy across
student models as reported and study the effectiveness of the novel modules introduced by the authors by conducting
ablation studies and new experiments.
Methodology
We start the reproduction effort by using the code open-sourced by the authors. We reproduce Tables 1 and 2 from the original paper
using the same. As we proceed further, we refactor and re-implement the code for a specific architecture (ResNets) and
refer to the authors’ code for specific implementation details (further discussed in Section 3.2). We implement the ablation
studies mentioned in the original paper and design experiments to verify the claims made by the authors. We release
our code as open-source.
Results
We reproduce the review mechanism on the CIFAR-100 dataset within 0.8% of the reported values. The claim to achieve
SOTA performance on the image classification task is verified consistently with different student models. The ablation
studies help us understand the significance of novel modules proposed by the authors. The experiments conducted on
the framework’s components further strengthen the claims made and help get further insights.
What was easy
The authors open-sourced the code for the paper. This made it easy to verify many results reported in the paper
(specifically Tables 1 and 2 in the original paper). The framework of the review mechanism was well described mathematically in the
paper, which made its implementation easier. The writing was simple and the diagrams used were self-explanatory,
which aided our conceptual understanding of the paper.
What was difficult
While the framework of the review mechanism was well described, further specifications of the architectural components,
ABF (residual output and ABF output as mentioned in Section 4.2.3) and HCL (number, sizes,
and weights of levels as mentioned in Section 4.2.2) could have been provided. These details would have made it easier to
translate the architecture into code. The most challenging part remained the lack of resources and time to run our
experiments. Each run took around 4-5 hours, making it difficult for us to report results averaged over multiple runs.
Communication with original authors
During the course of this study, we tried to contact the original authors more than once through e-mail. Unfortunately
we were unable to get any response from them.
Paper Url: https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Distilling_Knowledge_via_Knowledge_Review_CVPR_2021_paper.pdf
Paper Venue: CVPR 2021
Supplementary Material: zip
5 Replies
Loading