Abstract: Knowledge distillation, typically employed to condense a large `teacher' network into a smaller `student' network, has been found to also effectively transfer adversarial robustness into mobile-friendly students. In this study, however, we show that knowledge distillation between large models can also be used to purely enhance adversarial robustness. Specifically, we present a thorough analysis of different robust knowledge distillation (RKD) techniques with the aim to provide general guidelines to improve the adversarial performance of a student model. Our ablations demonstrate the significance of early stopping, model ensembling, label mixing, and the use of weakly adversarially trained teachers as keys to maximize a student's performance; but we also find that matching the student and teacher in adversarial regions is beneficial in some settings. We thus introduce a new adversarial knowledge distillation loss (AKD) which matches the student's and teacher's output on adversarial examples, to study when it can be beneficial in the context of RKD. Finally, we use our insights to enhance the state-of-the-art robust models and find that while our proposed guidelines can complement and improve them, the main achievable performance benefits still depend on the quantity and quality of the training data used.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Added results with ARD/ARD+ method in Table 6
- Improved clarity of caption in Table 2
- Added experiments with Tiny-ImageNet in Appendix B.2
- Added results of using TRADES with the teacher in the Appendix B.4
- Motivated why we use function matching in AKD
- Reworded text to not give the impression that our objective is to motivate the AKD loss function
Assigned Action Editor: ~Pin-Yu_Chen1
Submission Number: 1330
Loading